I have a dataset composed of 2 ecotypes, 2 conditions (control, treatment) and 2 geographic units (24 samples in total). I want to look at which genes are differentially expressed:
1. - between ecotype A and B at control,
2. - between ecotype A and B at treatment
3. - between control and treatment for ecotype A,
4. - between control and treatment for ecotype B or
5- equally I would like to look at different combinations including the geographic unit, between north and south for ecotype A, etc.
and I would even like to raise one level of complexity and include the three factors, for example, difference between north and south at treatment conditions for ecotype A (etc).
Following the DESeq2 manual, section 3.3 (Interactions), my solution is to define a factor as a combination of the factors I want to measure for example:
and then use just condition in the design formula:
cond <- DESeqDataSetFromMatrix(countData = countData,colData =colData, design = ~ condition)
and apply the contrast formula to look at two comparisons like so:
condP <- DESeq(cond)
resMTC <- results(condP, contrast = c("condition", "A.treat", "A.ctl"))
but someone told me to do this instead:
resMTC <- results(condP, contrast=c(1, -1, -1, 1))
Is this correct? If so why?
I would also like to know if in this case the multifactor analysis approach is more suitable. If I understand correctly, with the multifactor analysis including for example ecotype+condition, I will get a list of genes that are differentially expressed where the effect of the 2 ecotypes is controlled by the effect of the treatment. So I can extract the up and down regulated genes for the ecotype A and B, but I won't be able to know which genes are up regulated in A and at treatment conditions, whereas with the pairwise approach I would. Is this correct?
I was also suggested that if I do the multifactor analysis I have to use ecotype+condition+ecotype:condition for the full model and then ecotype:condition for the reduced but again wouldn't this analysis only yield the list of DEG between ecotypes A and B with no information about how the treatment/control conditions affect each gene? Otherwise how would I extract that information from the multifactor list.
These are my questions:
- Are ecotypes A and B responding different to the treatment (or to the control) conditions? (Q 1 and 2)
- What genes are DE between control and treatment for each ecotype? (Q 3 and 4)
- Does ecotype A in location 1 respond different to the treatment than the same ecotype in location 2? (Q 5)