Hello,
my sample sheet looks like this (RNA-seq):
FACTOR BATCH
factor1 run1
factor1 run2
factor1 run2
factor1 run1
factor2 run1
factor2 run2
factor2 run2
factor2 run1
factor3 run1
factor3 run2
factor3 run2
factor3 run1
factor4 run1
factor4 run2
factor4 run2
factor4 run1
so four conditions over two batches. Batch correction is necessary and beneficial as the effect was clearly visible in a PCA and could be removed with removeBatchEffect
from limma.
I use a design of ~ 0 + BATCH + FACTOR
as this is what I read in the edgeR manual.
This gives me this matrix:
run1 run2 FACTOR2 FACTOR3 FACTOR4
1 1 0 0 0 0
2 0 1 0 0 0
3 0 1 0 0 0
4 1 0 0 0 0
5 1 0 1 0 0
6 0 1 1 0 0
7 0 1 1 0 0
8 1 0 1 0 0
9 1 0 0 1 0
10 0 1 0 1 0
11 0 1 0 1 0
12 1 0 0 1 0
13 1 0 0 0 1
14 0 1 0 0 1
15 0 1 0 0 1
16 1 0 0 0 1
I can make most the contrasts I want, e.g. FACTOR2-FACTOR3 via the contrast argument c(0,0,1,-1,0)
.
Question-1: How would I make FACTOR1-FACTOR2 if I want to use the contrasts
rather than the coef
argument (because I wrote a script that is based on the contrast argument and I do not want to change it) Is this possible here with contrast?
Is it simply c(0,0,-1,0,0)
?
Question-2: For one analysis I am interested in the average effect between FACTOR 1 & 2 versus the average of 3 & 4.
If I only had the FACTOR level in the matrix with 0+FACTOR
I would do (FACTOR1-FACTOR2)/2 - (FACTOR3-FACTOR3)/2
but since FACTOR1 is not part of the matrix, how do I do this here? Should I write 0+FACTOR+BATCH
since then all four FACTORs are in the matrix?
Thank you!
Thank you, this would have been my intuituve workaround. Does this change anything from the mathematical perspective towards how the batch is taken into account or is it identical to the previous design?
Other than what coefficients you are estimating it's equivalent. You will get the same exact estimates for whatever comparison you want, only it's easier to do so with the design I recommend.