Hi All,
I have a dataset which contains two batches. I wanted to remove that batch effect and further analyze the data (to perform differential analysis and clustering). I used two approaches:
1. giving the batch and the biological variable of interest to the lmFit:
fit = lmFit(y, design=model.matrix(~ batch+fac))
fit = eBayes(fit)
pvals = fit$p.value[,2]
adjplim<-p.adjust(pvals,"BH",length(pvals))
and got more than 1000 genes differentially expressed between the groups defined by fac.
2.
y2 <- removeBatchEffect(y,batch, design =model.matrix(~fac))
fit = lmFit(y2, design=model.matrix(~ fac))
fit = eBayes(fit)
pvals = fit$p.value[,2]
adjplim<-p.adjust(pvals,"BH",length(pvals))
and got 0 genes differentially expressed between the groups defined by fac.
What am I doing wrong here?
The first approach doesn't give me a way to get a matrix which is clean from batch effect (like y2 in the second approach), so I used the second approach, but I don't know which method to rely on in terms of the differential gene expression.
Thanks!
Liron
Thank you for your answers!
I want to use the genes I get from the differential analysis (with the batch removed) to see if I get clear clusters of the biologic groups I have (defined by fac) after the batch was removed.
Do I have to give the removeBatchEffect() both the batch and the design (model.matrix(~fac)) as arguments?
Does that mean that each time I want to test a hypothesis between different biological groups I need to run removeBatchEffect() with different arguments? (only the batch will be the same).
Thanks a lot for your help!
Liron