Complex ANOVA analysis in EdgeR
1
0
Entering edit mode
kmyers2 • 0
@kmyers2-9792
Last seen 5.8 years ago

Hello,

I am using edgeR to try to construct a model to compare multiple conditions from some sequencing data. I have three growth conditions (A, B, C) and replicate samples for each condition at time 0 (T0) and after so many generations (T1). The edgeR manual was quite helpful in constructing a model to compare the individual growth conditions (T1 vs T0 for A, T1 vs T0 for B, T1 vs T0 for C) to identify differentially expressed genes for each condition. However, I am now struggling with how to construct a design/model.matrix to find genes that are unique and significant only in a single growth condition (only significant in A T1 vs T0 compared to B T1 vs T0 or C T1 vs T0) or significant only in two of the conditions (significant in B T1 vs T0 and C T1 vs T0 but not in A T1 vs T0).

The manual shows that I can find differences between two comparisons using the following construct:

> AvsB.contrast = makeContrasts(A.vs.B = (A_T1-A_T0) - (B_T1-B_T0), levels = design)

But I am a bit confused how to use this to compare between three conditions. One contrast I've tried is:

> AvsBvsC.contrast = makeContrasts(A.vs.B = (A_T1-A_T0) - (B_T1-B_T0) - (C_T1-C_T0), levels = design)

But I am not sure if that is giving me those genes that are unique ONLY in condition A and not in condition B or C.

Based on other reading on this site (which is great) I've found some examples and I've also tried this contrast:

> A.contrast<-makeContrasts(contrasts=c("(A_T1-A_T0) - (B_T1-B_T0)","(A_T1-A_T0) - (C_T1-C_T0)", levels=design)

But am again unsure if that is giving me the output I want - genes that are significant differentially expressed ONLY in condition A and not conditions B or C.

I also am not sure how to go about finding genes that are unique ONLY in conditions B and C but not A (or B and C individually).

Any help would be greatly appreciated. Thank you in advance!

-Kevin

edger anova sequencing • 998 views
2
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 1 hour ago
The city by the bay

None of the contrasts in your post will do what you want, i.e., to find genes that are DE (with respect to time) only in condition A and not conditions B or C. This is because you're asking for an absence of DE in B and C, whereas all of edgeR's statistical machinery is built to find DE. Rather, you'll have to use a composite strategy:

resA <- glmLRT(fit, contrast=makeContrasts(A_T1 - A_T0, levels=design))
is.de.A <- p.adjust(resA$table$PValue, method="BH") <= 0.05
resB <- glmLRT(fit, contrast=makeContrasts(B_T1 - B_T0, levels=design))
resC <- glmLRT(fit, contrast=makeContrasts(C_T1 - C_T0, levels=design))
keep <- is.de.A & resB$table$PValue >=0.2 & resC$table$PValue >=0.2

This will identify genes that are DE in condition A at a FDR of 5%, and have no strong evidence for DE in conditions B and C (the threshold on the p-value is "any arbitrary large value"). Admittedly, this is rather ad hoc because the lack of evidence for DE doesn't equate to actual evidence for non-DE. But a formal test for non-DE (e.g., TOST) would be very conservative and probably not give you anything at all.

I'm not sure what you mean by "unique ONLY in conditions B and C but not A (or B and C individually)". How can a DE gene be unique in both B and C? If it's unique, it's got to be in one or the other by definition.

0
Entering edit mode

Thank you very much Aaron!

That makes sense. To re-phrase my question and perhaps clarify what I am trying to do. Condition A is my experimental condition, where as B and C are two control conditions (with samples taken at T0 and after some time). For each condition I can identify genes that are DE at T1 vs T0 (and get lists of genes that meet a FDR cutoff). I want would like to do is use conditions B and C as controls to make sure the DE genes in A are due only to A and not to changes that would have occurred in the controls over time. So is there any way in edgeR to do what I am asking?

I think what you describe makes sense, but I worry about the different statistical power in A, B, and C conditions. I hope that makes sense, but please let me know and I can try to clarify it.

0
Entering edit mode

If that's all you want, then you don't need to look for non-DE genes. Rather, all you need to do is:

con1 <- makeContrasts((A_T1-A_T0) - (B_T1-B_T0), levels=design)
con2 <- makeContrasts((A_T1-A_T0) - (C_T1-C_T0), levels=design)

... and run glmLRT separately on con1 and con2, and intersect the resulting DE lists. This will identify genes that are DE in A against both controls (also probably best to select genes changing in the same direction). This avoids the grief of trying to identify non-DE genes. In fact, there's no need to mandate non-DE in the control T1-T0 contrasts, as you requested in your original post; if a particular gene goes up 2-fold in the controls over time, but it goes up 10-fold in A over time, then you still want to pick up that gene, even if it's no longer non-DE in the control.

As for your other remark, differences in power between contrasts aren't an issue here if you only focus on the intersection - if it's detected (at a certain FDR threshold) in both contrasts, then it's present in the intersection, and that's that. Okay, power would affect the size of the intersection, but not the reliability of the genes in the intersection itself.

0
Entering edit mode

Thanks. That makes sense to me. I really appreciate your help and advice.