Question: [DESeq2] time series strain-specific differences over time
4
4.1 years ago by
john60
Germany
john60 wrote:

Hello guys,

I am doing some RNA-seq analysis using DESeq2 and following the vignette from here, where they use time points 0,15,30,60 ... 180 min. and two groups (strains)

http://www.bioconductor.org/help/workflows/rnaseqGene/

In the time series chapter it says:

"Genes with small p values from this test are those which, at one or more time points after time 0 showed a strain-specific effect."

As an example the following is shown:

## log2 fold change (MLE): strainmut.minute180
## LRT p-value: '~ strain + minute + strain:minute' vs '~ strain + minute'
## DataFrame with 4 rows and 7 columns
##               baseMean log2FoldChange     lfcSE      stat       pvalue         padj      symbol
##              <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric> <character>
## SPBC2F12.09c  174.6712    -2.65763737 0.7498270  99.23199 7.671942e-20 5.184698e-16       atf21

Question 1)

As I am understanding it- this gene has a strain specific expression profile over time. This effect occures over the other time points (from 0 to 180).

If I had picked the strainmut.minute60  then we would have seen a list of DE genes from 0 to 60 minutes, right?

Just for clarification- here are the options of which one can choose:

resultsNames(ddsTC)

##  [1] "Intercept"           "strain_mut_vs_wt"    "minute_15_vs_0"      "minute_30_vs_0"
##  [5] "minute_60_vs_0"      "minute_120_vs_0"     "minute_180_vs_0"     "strainmut.minute15"
##  [9] "strainmut.minute30"  "strainmut.minute60"  "strainmut.minute120" "strainmut.minute180"

Question 2)

Imagine a gene that behaves in the same way in all timepoints (between the two groups) but time point 0 to 15.

Will this be detected by deseq?

Cheers,

john

timecourse deseq2 • 6.4k views
modified 4.1 years ago • written 4.1 years ago by john60
Answer: [DESeq2] time series strain-specific differences over time
7
4.1 years ago by
Michael Love22k
United States
Michael Love22k wrote:

"If I had picked the strainmut.minute60  then we would have seen a list of DE genes from 0 to 60 minutes, right?"

This is a model with interaction terms, and this is an interaction term. .You can tell it is an interaction term because it contains the names of both variables strain and minute. So this term is a test for if the mut vs WT fold change is different at minute 60 than at minute 0.

To generate the tables of log fold change of 60 minutes vs 0 minutes for the WT strain would be:

results(dds, name="minute_60_vs_0")

To generate the tables of log fold change of 60 minutes vs 0 minutes for the mut strain would be the sum of the WT term above and the interaction term which is an additional effect beyond the effect for the reference level (WT):

results(dds, contrast=list(c("minute_60_vs_0","strainmut.minute60"))

"Imagine a gene that behaves in the same way in all timepoints (between the two groups) but time point 0 to 15. Will this be detected by deseq?"

This is the description of an interaction term in the model. You can find this by testing the interaction term strainmut.minute15:

results(dds, name="strainmut.minute15")
Answer: [DESeq2] time series strain-specific differences over time
0
4.1 years ago by
john60
Germany
john60 wrote:

Hello Michael,

You said "So this term is a test for if the mut vs WT fold change is different at minute 60 than at minute 0. "

Alright. A gene that has a low pvalue in strainmut.minute60 AND 30 AND 120 AND 180 is then to be considered DE in all time points (compared to 0) between WT and MUT.

----------------

You said:

"To generate the tables of log fold change of 60 minutes vs 0 minutes for the WT strain would be:

results(dds, name="minute_60_vs_0")"

So this shows me genes that are DE between two time points in one group (WT). Not considering the MUT at all.

---------------

You said: "reference level (WT):"

Why is WT our reference? Where did we specify that WT is the reference and not MUT? Why this way?

-------------

By calling

results(dds, name="strainmut.minute180")

Deseq finds this gene because it takes 0minutes and 180minutes into account. That's alright.

Is there a way to take all time points into account? In order to find a gene that is DE in all time points between two groups?

1

See the DESeq2 vignette for details on how to tell R which level is the reference level for a factor variable.

In this case the reference level was set during object construction which can be followed at the fission package vignette.

Is there a way to take all time points into account? In order to find a gene that is DE in all time points between two groups?

If you want to find genes which have a significant mut vs wt fold change for each and every time point, I don't think there is a simple way to form this results table. You would have to test the fold change for mut vs WT at minute 0:

results(dds, name="strain_mut_vs_wt")


...and then test the fold change for mut vs wt at every other time point:

results(dds, contrast=list(c("strain_mut_vs_wt","strainmut.minute15")))


And you would then look for genes which have a small adjusted p-value in all of the results tables.

To clarify: the first comparison takes into account the difference at t0, and this is recommended. The second doesn't take into account the difference at t0, and this is not usually recommended.

So if we do

results(dds, contrast=list(c("strain_mut_vs_wt","strainmut.minute15")))

and then

results(dds, contrast=list(c("strain_mut_vs_wt","strainmut.minute30")))

and on and on and then just select the genes that have an adjusted pvalue less than some threshold for all these tests, would this give the correct significance? I mean all of these tests share strainmut.minute0 as the control so they are correlated. How can we model this correlation? Instead, is there a way to test for a composite hypothesis such as  \beta_1 >0 AND \beta_2 >0 where \beta_1 and \beta_2 are two different fold change effects with the same baseline?

This isn't a concern in linear models, think about the coefficient table that lm() returns for example when there are 3 or more factor levels. The assumption is that the data is not correlated conditioned on the design matrix X.

No, we don't have a way to test such composite alternative hypotheses.

I am interested in genes that are DE at each time point (i.e. ALL time points) with respect to 0. I ran some simulations for the lm model at http://rpubs.com/ysdel/corpvals and it seems that the tests are correlated. I don't know how to think about this correctly to find the significance levels.

I don't really follow your question, because I don't see where we make the assumption that test statistics from different contrasts would be independent.

Nevertheless, we don't have in DESeq2 a way for you to generate a p-value where the alternate hypothesis is that all time points after t0 have consistent sign of LFC compared to t0.

I'm sorry, I wasn't clear in what the problem is. Suppose I have results res1, res2, res3 from correlated tests. And I am interested in genes that satisfy the alternative hypothesis for all of these tests. So I will have to do

hits = (res1$padj<0.05) & (res2$padj<0.05) & (res3$padj<0.05) [or, forgetting about multiple testing adjustments, hits = (res1$p.value<0.05) & (res2$p.value<0.05) & (res3$p.value<0.05) ]

What is the statistical significance (in terms of FDR or Type I errors) of these hits? If I had a simple hypothesis, I know for (res1$padj<0.05) I will make a type I error 5% of the time. I very much appreciate you taking the time to answer these questions. ADD REPLYlink written 2.1 years ago by ysdel30 You're correct that that statement doesn't have the same control as the tests we provide. We don't have functionality for this kind of combination in DESeq2. You could use pvalue combination methods. ADD REPLYlink written 2.1 years ago by Michael Love22k Hi Michael, I am analyzing RNA-Seq data from a time-course of a mutant strain and would like to ask you something related to this reply. Following what is described in the time-course vignette - In order to calculate the fold change for mut vs wt at an individual time-point shouldn't we use the Wald test like in: results(dds, name="strainmut.minute15", test="Wald")  What is the difference between using the one above and the following contrast, which as you describe, "(...) test the fold change for mut vs wt at every other time point" (time-point 15 in this particular case): results(dds, contrast=list(c("strain_mut_vs_wt","strainmut.minute15"))) Many thanks! Miguel ADD REPLYlink written 8 months ago by santos.migueldm0 1 "strainmut.minute15" is the difference between Mut vs WT at minute 15, controlling for baseline. If you add "strain_mut_vs_wt" to this, you get the LFC for Mutant vs WT at minute 15, not controlling for baseline. So the second one is the observed difference at minute 15 between the two groups (because you added in the change that was present at time=0). ADD REPLYlink written 8 months ago by Michael Love22k Hi Michael, Just a quick question on the answer you gave for "Is there a way to take all time points into account? In order to find a gene that is DE in all time points between two groups?" I followed your tutorial on the fission data for time series experiments, and i thought that taking a summary of results > summary(resTC) after running the steps below would give an indication of the total genes differentially expressed across all time-points under consideration: ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ strain + minute) resTC <- results(ddsTC) resTC$symbol <- mcols(ddsTC)$symbol head(resTC[order(resTC$pvalue),],4)

Kindly clarify this part for me, because these are the same differentially expressed genes you then went on to use for construction of the heatmap

topGenes <- head(order(resTC\$pvalue),40)

So which time-point are these representing??

Thanks,

Stan

1

The likelihood ratio test here is testing genes for condition-specific differences over time after time 0. Here's the relevant section from the workflow:

"The following chunk performs a likelihood ratio test, where we remove the strain-specific differences over time. Genes with small p values from this test are those which, at one or more time points after time 0 showed a strain-specific effect. Note therefore that this will not give small p values to genes which moved up or down over time in the same way in both strains."