in RUVseq manual they state for the downstream analysis with edgeR to use following design matrix:
design <- model.matrix( ̃x + W_1, data=pData(set1))
, where W_1 are the coefficients of unwanted variance
In edgeR manual batch effect should be the first in the design matrix:
design <- model.matrix(~Batch+Treatment)
Also, when I perform DESeq2 analysis as with the
DESeqDataSetFromMatrix(countData=signal, colData=Design, design=~condition+batch)
I get following header from the results :
log2 fold change (MAP): batch
Wald test p-value: batch
If I use instead DESeqDataSetFromMatrix(countData=signal, colData=Design, design=~batch +condition)
I get following and it seems to be appropriate:
log2 fold change (MAP): condition WEN vs WNN
Wald test p-value: condition WEN vs WNN
What would be a correct way to include factors of unwanted variance in the deseq and edger design matrix?
Thank you for the answer. And sorry, it said only "batch" not "condition batch"
Similarly, in
edgeR
, it doesn't really matter for GLM fitting or dispersion estimation whetherbatch
is put at the start or end. The only thing that will change is the interpretation of the coefficients, and thecoef
orcontrast
you need to supply toglmLRT
. Keepingbatch
at the start just makes it convenient as the last coefficient will represent the treatment effect. The last coefficient is dropped by default inglmLRT
, so you don't need any extra arguments to do the DE test of interest.