Question: interpretation of edgeR multi factor conditions in GLM
0
2.5 years ago by
Philippine Genome Center

Good Day!

I have trouble interpreting if the design model I made are fetching the DE genes that affects only my factors.

I am doing an DE analysis in edgeR with two factors: salt (Na) and pH. Each factor has two levels high (H) and low (L). The goal of the experiment is to determine genes that are affecting tolerances in salt only, in pH only and that affects both salt and pH (pH-Na). Each with six replicates. Per my understanding of userguide, I made a no-intercept design model matrix as I feel there is no baseline.

>group<-factor(paste(targets$pH,targets$Salt,sep="_")) >group  [1] LpH_LNa   LpH_LNa   LpH_LNa   LpH_LNa   LpH_LNa   LpH_LNa    [7] LpH_HNa  LpH_HNa  LpH_HNa  LpH_HNa  LpH_HNa  LpH_HNa [13] HpH_LNa  HpH_LNa  HpH_LNa  HpH_LNa  HpH_LNa  HpH_LNa [19] HpH_HNa HpH_HNa HpH_HNa HpH_HNa HpH_HNa HpH_HNa Levels: HpH_HNa HpH_LNa LpH_HNa LpH_LNa >design <- model.matrix(~0+group)

For simplicity, the design model matrix looks like this:

 HpH_Hna HpH_Lna LpH_Hna LpH_Lna A1 0 0 0 1 A2 0 0 0 1 B1 0 0 1 0 B2 0 0 1 0 C1 0 1 0 0 C2 0 1 0 0 D1 1 0 0 0 D2 1 0 0 0

>basic.contrasts <- makeContrasts(   cBvsA = LpH_HNa - LpH_LNa, #(B-A)   cCvsA = HpH_LNa - LpH_LNa, #(C-A)   cDvsC = HpH_HNa - HpH_LNa, #(D-C)   cDvsB = HpH_HNa - LpH_HNa, #(D-B)   pHeffect = ((HpH_LNa + HpH_HNa)/2) - ((LpH_HNa + LpH_LNa)/2),   Naeffect = ((HpH_HNa + LpH_HNa)/2) - ((HpH_LNa + LpH_LNa)/2),   pHNaeffect = ((HpH_HNa-LpH_HNa)-(HpH_LNa-LpH_LNa)), 

>basic.contrasts                Contrasts Levels          cBvsA cCvsA cDvsC cDvsB pHeffect Naeffect pHNaeffect   HpH_HNa       0       0       1       1      0.5      0.5        1   HpH_LNa       0       1      -1       0      0.5     -0.5       -1   LpH_HNa       1       0       0      -1     -0.5      0.5       -1   LpH_LNa      -1      -1       0       0     -0.5     -0.5        1

My question is:

1) Is it correct to assume that the genes created from "pHeffect", which can be computed as [(HpH_HNa+HpH_LNa)/2 - (LpH_HNa+LpH_LNa)/2], will yield the DE genes that affects pH regardless of salt?

2) Or, it is correct assume that the genes common for both DE in cDvsB [HpH_HNa - LpH_HNa] and cCvsA [HpH_HNa - LpH_HNa] will yield the DE genes that affects pH regardless of salt?

3) I would like also to confirm if my interpretation that the pHNaeffect [(HpH_HNa - LpH_HNa) - (HpH_LNa - LpH_LNa)] are the genes that affects both pH and salt tolerance?

4) Or is there a better way to do these?

modified 2.5 years ago by Ryan C. Thompson7.4k • written 2.5 years ago by gamaliel.lysander.cabria0
Answer: interpretation of edgeR multi factor conditions in GLM
3
2.5 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.4k wrote:
1. The contrast you have written for pH effect computes the pH response at a salt concentration that is halfway between high and low under the assumption that the pH response is linearly related to salt concentration. This is probably not what you wanted. Since your design is balanced, it might turn out that this contrast ends up being very close to the overall pH effect anyway, but in general, there is no way to extract the overall "pH effect" from this model, since the model assumes that the pH effect is salt-dependent. You can only evaluate the pH effect at a given salt concentration. Similarly, "salt effect" contrast is wrong for the same reason.
2. This is the more correct way to get at this question.
3. No, this contrast will identify genes whose pH response is significantly different in low vs. high salt conditions. Equivalently, it will identify genes whose salt response is significantly different in low vs. high pH. To get a list of genes that are affected by both conditions, I would use an additive model (i.e. formula of ~1 + pH + salt, where pH and salt are factors), which should produce a 3-column design matrix: intercept, pH high vs. low, and Na high vs. low. Testing the 2nd and 3rd coefficients will give you lists of genes affected by pH and by Na, and taking the intersection of these lists will give you the set of genes affected by both.
2

Adding onto Ryan's answer to #3: the contrast you've supplied will test whether the interaction between pH and salt is zero, i.e., is the combined effect of pH and salt equal to the sum of the individual effects?

If you want to find genes that are DE with both salt and pH, you could use Ryan's additive approach, provided that most genes do not have any significant interactions (otherwise the variance estimates would be inflated by a misspecified model). Alternatively, you could use your existing model with the one-way layout, and do something like:

• Intersect the DE lists resulting from the contrasts LpH_LNa vs LpH_HNa and LpH_LNa vs HpH_LNa. This will identify genes that are DE upon increasing either salt or pH.
• Intersect the DE lists resulting from the contrasts HpH_HNa vs LpH_HNa and HpH_HNa vs HpH_LNa. This will identify genes that are DE upon decreasing either salt or pH.

The use of two intersection operations reflects the fact that the effect of the perturbations are dependent on the context, i.e., increasing salt in a low-pH context may not have the same effects as decreasing salt in a high-pH context.

Thank you!

I thought so! I'm wrong. The venn diagram I made when I match the intersection and my old "pH/Na effect" does not coincide well. I'll try to use an additive model. It pH and salt DEs there should correspond similarly with the genes that are common with [HpH_HNa - LpH_HNa] and cCvsA [HpH_LNa - LpH_LNa] in my old model, right?