Hello,
I am using edgeR to try to construct a model to compare multiple conditions from some sequencing data. I have three growth conditions (A, B, C) and replicate samples for each condition at time 0 (T0) and after so many generations (T1). The edgeR manual was quite helpful in constructing a model to compare the individual growth conditions (T1 vs T0 for A, T1 vs T0 for B, T1 vs T0 for C) to identify differentially expressed genes for each condition. However, I am now struggling with how to construct a design/model.matrix to find genes that are unique and significant only in a single growth condition (only significant in A T1 vs T0 compared to B T1 vs T0 or C T1 vs T0) or significant only in two of the conditions (significant in B T1 vs T0 and C T1 vs T0 but not in A T1 vs T0).
The manual shows that I can find differences between two comparisons using the following construct:
> AvsB.contrast = makeContrasts(A.vs.B = (A_T1-A_T0) - (B_T1-B_T0), levels = design)
But I am a bit confused how to use this to compare between three conditions. One contrast I've tried is:
> AvsBvsC.contrast = makeContrasts(A.vs.B = (A_T1-A_T0) - (B_T1-B_T0) - (C_T1-C_T0), levels = design)
But I am not sure if that is giving me those genes that are unique ONLY in condition A and not in condition B or C.
Based on other reading on this site (which is great) I've found some examples and I've also tried this contrast:
> A.contrast<-makeContrasts(contrasts=c("(A_T1-A_T0) - (B_T1-B_T0)","(A_T1-A_T0) - (C_T1-C_T0)", levels=design)
But am again unsure if that is giving me the output I want - genes that are significant differentially expressed ONLY in condition A and not conditions B or C.
I also am not sure how to go about finding genes that are unique ONLY in conditions B and C but not A (or B and C individually).
Any help would be greatly appreciated. Thank you in advance!
-Kevin
Thank you very much Aaron!
That makes sense. To re-phrase my question and perhaps clarify what I am trying to do. Condition A is my experimental condition, where as B and C are two control conditions (with samples taken at T0 and after some time). For each condition I can identify genes that are DE at T1 vs T0 (and get lists of genes that meet a FDR cutoff). I want would like to do is use conditions B and C as controls to make sure the DE genes in A are due only to A and not to changes that would have occurred in the controls over time. So is there any way in edgeR to do what I am asking?
I think what you describe makes sense, but I worry about the different statistical power in A, B, and C conditions. I hope that makes sense, but please let me know and I can try to clarify it.
If that's all you want, then you don't need to look for non-DE genes. Rather, all you need to do is:
... and run
glmLRT
separately oncon1
andcon2
, and intersect the resulting DE lists. This will identify genes that are DE in A against both controls (also probably best to select genes changing in the same direction). This avoids the grief of trying to identify non-DE genes. In fact, there's no need to mandate non-DE in the control T1-T0 contrasts, as you requested in your original post; if a particular gene goes up 2-fold in the controls over time, but it goes up 10-fold in A over time, then you still want to pick up that gene, even if it's no longer non-DE in the control.As for your other remark, differences in power between contrasts aren't an issue here if you only focus on the intersection - if it's detected (at a certain FDR threshold) in both contrasts, then it's present in the intersection, and that's that. Okay, power would affect the size of the intersection, but not the reliability of the genes in the intersection itself.
Thanks. That makes sense to me. I really appreciate your help and advice.