I have a naive question,
Even though I've seen a lot of question regarding this issue (batch effect) in RNA-Seq analysis. I still don't understand exactly what EDGE-R is performing during this procedure.
For example I have only 2 different conditions (let's call WT and Mutant).
In this case I can:
group <- factor(c(rep("WT", 3),
rep("Mutant",3)))
and create a batch factor:
batch <- factor(c("1", "1", "2",
"1", "1", "2"))
Then:
design <- model.matrix(~group+batch, data = y$samples )
or:
design <- model.matrix(~batch+group, data = y$samples)
design
(Intercept) groupMutant batch2
WT-1 1 0 0
WT-2 1 0 0
WT-3 1 0 1
Mut-1 1 1 0
Mut-2 1 1 0
Mut-3 1 1 1
How do I perform downstream analysis?
Specifically in lrt:
lrt <- glmLRT(fit,coef =?)
which coefficient should I use? If I trully understand I should use coef = 2 right?
But if I use only coef = 2 will it take into account the batch effect? or should I use coef = 2:3? Or even glmLRT(fit)?
What is going under hood?
Thank you very much,
Regards
Thank you Aaron again for all the help.