Hi All,

I have a dataset which contains two batches. I wanted to remove that batch effect and further analyze the data (to perform differential analysis and clustering). I used two approaches:

1. giving the batch and the biological variable of interest to the lmFit:

fit = lmFit(y, design=model.matrix(~ batch+fac))

fit = eBayes(fit)

pvals = fit$p.value[,2]

adjplim<-p.adjust(pvals,"BH",length(pvals))

and got more than 1000 genes differentially expressed between the groups defined by fac.

2.

y2 <- removeBatchEffect(y,batch, design =model.matrix(~fac))

fit = lmFit(y2, design=model.matrix(~ fac))

fit = eBayes(fit)

pvals = fit$p.value[,2]

adjplim<-p.adjust(pvals,"BH",length(pvals))

and got 0 genes differentially expressed between the groups defined by fac.

What am I doing wrong here?

The first approach doesn't give me a way to get a matrix which is clean from batch effect (like y2 in the second approach), so I used the second approach, but I don't know which method to rely on in terms of the differential gene expression.

Thanks!

Liron

Thank you for your answers!

I want to use the genes I get from the differential analysis (with the batch removed) to see if I get clear clusters of the biologic groups I have (defined by fac) after the batch was removed.

Do I have to give the removeBatchEffect() both the batch and the design (model.matrix(~fac)) as arguments?

Does that mean that each time I want to test a hypothesis between different biological groups I need to run removeBatchEffect() with different arguments? (only the batch will be the same).

Thanks a lot for your help!

Liron