DESeq2 - precisions about interaction and fold change calculation

0

Entering edit mode

samuel collombet ▴ 140

@samuel-collombet-6574

Last seen 6.9 years ago

France

Dear all, I am trying do performed some analysis with DESeq2, using a model with interaction therms, but I have a hard time understanding what DESeq2 is doing exactly (I am also not very familiar with R notation and have a hard time understanding it too, sorry). What I want to do correspond to the what is described in section "3.3 Interactions" of the vignette " Differential analysis of count data - the DESeq2 package", i.e. one model with 3groups-2treatments, and one with 2groups-2treatments (2 different analysis). - in the first example (p21, comparing "patient4.treatmentOHT" to "patient4.treatmentControl"), what will be exactly the numerator and denominators? How is the log2FoldChange calculated in this case? (I have read section 4.2 and the pre-submission on bioArxiv, but I am still not sure of what is calculated in this case ; I also do nto really understand the sentence " Note that the log2 fold change for treatment of OHT over control for patient 4 is the interaction effect above in addition to the main effect of treatment OHT over control", p21). - in the second example p22, with only 2groups and 2 treatment, to what contrast does "patient2.treatmentOHT" correspond to? The specific effect of treatment OHT on patient 2, taking into account the general effect of OHT? Many thanks in advance! Samuel [[alternative HTML version deleted]]

DESeq2 DESeq2 • 1.9k views

ADD COMMENT • link updated 9.9 years ago by Michael Love 41k • written 9.9 years ago by samuel collombet ▴ 140

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 11 minutes ago

United States

(Edit Oct 2015: note that this thread no longer applies to the current implementation in DESeq2 v1.10)

hi Samuel,

I respond below with direct answers to you questions. We have a slightly different implementation in DESeq2 using "expanded model matrices", where each group gets its own effect rather than having non-base levels compared to a base level. The log fold changes (LFC) end up very similar compared to the standard analysis with a base level, except that the shrinkage of LFC is symmetric with our implementation. An exception is the 2x2 model with interaction term, where we do not use expanded model matrices for simplicity and because the LFC shrinkage happens to be symmetric in this case.

> - in the first example (p21, comparing "patient4.treatmentOHT" to > "patient4.treatmentControl"), what will be exactly the numerator and > denominators? How is the log2FoldChange calculated in this case? The numerator is the OHT interaction term in patient 4 and the denominator is the Control interaction term in patient 4. Subtracting the later from the former gives the OHT vs Control interaction effect for patient 4. If this is significant, it means the OHT vs Control effect for patient 4 is not explained by the main OHT vs Control effect. > (I have > read section 4.2 and the pre-submission on bioArxiv, but I am still not > sure of what is calculated in this case ; I also do nto really > understand the sentence " Note that the log2 fold change for treatment > of OHT over control for patient 4 is the interaction effect above in > addition to the main effect of treatment OHT over control", p21). > This sentence just reminds the user that the interaction terms are added to the main effect terms: treatment effect for patient 4 = main treatment effect + patient4 treatment interaction term. If the interaction term is 0, then the main effect term is sufficient to explain the fold changes due to treatment for that group. If the interaction term is nonzero, the above formula gives the fold change due to treatment for patient 4. I will make a note to clear up this sentence in the vignette. > - in the second example p22, with only 2groups and 2 treatment, to what > contrast does "patient2.treatmentOHT" correspond to? The specific effect > of treatment OHT on patient 2, taking into account the general effect of > OHT? > Yes. Mike > Many thanks in advance! > Samuel > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 9.9 years ago • updated 8.5 years ago Michael Love 41k

0

Entering edit mode

Hi Samuel To add to Mike's response: >> - in the first example (p21, comparing "patient4.treatmentOHT" to >> "patient4.treatmentControl"), what will be exactly the numerator and >> denominators? How is the log2FoldChange calculated in this case? > > The numerator is the OHT interaction term in patient 4 and the > denominator is the Control interaction term in patient 4. Subtracting > the later from the former gives the OHT vs Control interaction effect > for patient 4. If this is significant, it means the OHT vs Control > effect for patient 4 is not explained by the main OHT vs Control > effect. [...] > I also do not really understand the sentence " Note that the log2 fold > change for treatment of OHT over control for patient 4 is the > interaction effect above in addition to the main effect of treatment > OHT over control", p21). I add a complementary explanation to Mike's answer, which, however, is a bit lengthy -- but maybe helpful to understand how interactions really work. In linear models, things often become clearer when you look at the model matrix (aka design matrix). This is the matrix that spells out how the fitted response for each sample is made up from the coefficients that you find in the result object, and it is at the actual heat of the fitting process. In DESeq2, our model matrices are a bit different from those usually used in linear modelling (and as returned by R's standard function 'model.matrix'), so it pays even for users with experience in linear modelling to have a look (or see the section on "Extended design matrices" in our preprint.) In DESeq2, you can access the model matrix as follows (where ddsX is the object as found at the end of the vignette section on interactions): > mm <- attr( ddsX, "modelMatrix" ) > mm Intercept patient1 patient2 patient3 patient4 SRR479052 1 1 0 0 0 SRR479053 1 1 0 0 0 SRR479054 1 1 0 0 0 SRR479055 1 1 0 0 0 [...] SRR479077 0 0 SRR479078 0 0 patient3.treatmentOHT patient4.treatmentOHT SRR479052 0 0 SRR479053 0 0 [...] SRR479075 0 0 SRR479076 0 1 SRR479077 0 1 SRR479078 0 1 attr(,"assign") [1] 0 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 Each row in the model matrix corresponds to one sample. Here again is the meta data for ddsX: > as.data.frame( colData(ddsX)[,c("patient","treatment")] ) patient treatment SRR479052 1 Control SRR479053 1 Control SRR479054 1 DPN SRR479055 1 DPN SRR479056 1 OHT SRR479057 1 OHT SRR479058 2 Control SRR479059 2 Control SRR479060 2 DPN SRR479061 2 DPN SRR479062 2 DPN SRR479063 2 OHT SRR479064 2 OHT SRR479065 2 OHT SRR479066 3 Control SRR479067 3 Control SRR479068 3 DPN SRR479069 3 DPN SRR479070 3 OHT SRR479071 3 OHT SRR479072 4 Control SRR479073 4 DPN SRR479074 4 DPN SRR479075 4 DPN SRR479076 4 OHT SRR479077 4 OHT SRR479078 4 OHT Remember, by the way, that we had thrown away the time point information in this example to pretend that, e.g., the last three samples are replicates. Each column in the model matrix corresponds to one of the coefficients that is fitted in the model. Note how hence the output of 'resultsNames(ddsX)' is just a listing of the column names of the model matrix. Now look at the last line of the model matrix: > mm["SRR479078",] Intercept patient1 1 0 patient2 patient3 0 0 patient4 treatmentControl 1 0 treatmentDPN treatmentOHT 0 1 patient1.treatmentControl patient2.treatmentControl 0 0 patient3.treatmentControl patient4.treatmentControl 0 0 patient1.treatmentDPN patient2.treatmentDPN 0 0 patient3.treatmentDPN patient4.treatmentDPN 0 0 patient1.treatmentOHT patient2.treatmentOHT 0 0 patient3.treatmentOHT patient4.treatmentOHT 0 1 This shows which of the coefficients you would need to add up to see the model's prediction for a gene in this sample (which is from patient4, treated with OHT), namely the following: - Intercept: the overall log expression of the gene under consideration, essentially averaged over all level combinations, - patient4: the patient main effect for patient 4, i.e., the amount by which the expression in the samples from patient 4 differs from the average - treatmentOHT: the treatment main effect for OHT, the amount by which expression under OHT treatment differs from the average - patient4.treatmentOHT: the interaction term, i.e., the amount by which the OHT treatment effect differs in patient 4 from the average OHT treatment effect or -- and this is the same thing, phrased differently -- the amount by which the patient-4 effect differs in case of OHT treatment from the average over treatments Note that the other three samples for patient 4 under OHT have exactly the same values in their model matrix row. Hence, if we want to know the effect of OHT treatment on patient 4 as compared to control, we can subtract the model matrix row for the patient-4 control sample from that for one of the patient-4 OHT samples: > contr = mm["SRR479078",] - mm["SRR479072",] > contr Intercept patient1 0 0 patient2 patient3 0 0 patient4 treatmentControl 0 -1 treatmentDPN treatmentOHT 0 1 patient1.treatmentControl patient2.treatmentControl 0 0 patient3.treatmentControl patient4.treatmentControl 0 -1 patient1.treatmentDPN patient2.treatmentDPN 0 0 patient3.treatmentDPN patient4.treatmentDPN 0 0 patient1.treatmentOHT patient2.treatmentOHT 0 0 patient3.treatmentOHT patient4.treatmentOHT 0 1 Such a vector is called a "contrast vector". It shows how the desired contrast is formed by adding up / subtracting the "log2FoldChange" columns of the respective result objects. If we skip over all the zeroes, we see that this one does the following: treatmentOHT - treatmentControl + patient4.treatmentOHT - patient4.treatmentControl In other words, it is the sum of the overall effect of OHT treatment as compared to control plus the specific extra effect that it has on patient 4. To get this in DESeq2, you would write: results(ddsX, contrast = contr ) In the vignette, we discussed the contrast results(ddsX, contrast = list( "patient4.treatmentOHT", "patient4.treatmentControl" ) ) and as you can now see, this is only the extra effect specific to patient 4, i.e., the interaction effect. To get the full effect of OHT treatment on patient 4, we need to add the main effect of OHT treatment as well. That would be the one returned by results(ddsX, contrast = list( "treatmentOHT", "treatmentControl" ) ) or, with identical results results(ddsX, contrast = list( "treatment", "OHT", "Control" ) ) Now, if you sum up this main effect and the interaction effect from two paragraphs up, you get the same as what the call with the big contrast vector gave you. Might be a bit of effort to work through this text, but if you manage, you will hopefully have a much better understanding what an interaction is. Simon

ADD REPLY • link 9.9 years ago Simon Anders ★ 3.7k

Login before adding your answer.