Search
Question: [DESeq2] time series strain-specific differences over time
2
gravatar for john
2.7 years ago by
john40
Germany
john40 wrote:

Hello guys,

I am doing some RNA-seq analysis using DESeq2 and following the vignette from here, where they use time points 0,15,30,60 ... 180 min. and two groups (strains)

http://www.bioconductor.org/help/workflows/rnaseqGene/

In the time series chapter it says:

"Genes with small p values from this test are those which, at one or more time points after time 0 showed a strain-specific effect."

As an example the following is shown:

## log2 fold change (MLE): strainmut.minute180 
## LRT p-value: '~ strain + minute + strain:minute' vs '~ strain + minute' 
## DataFrame with 4 rows and 7 columns
##               baseMean log2FoldChange     lfcSE      stat       pvalue         padj      symbol
##              <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric> <character>
## SPBC2F12.09c  174.6712    -2.65763737 0.7498270  99.23199 7.671942e-20 5.184698e-16       atf21

Question 1)

As I am understanding it- this gene has a strain specific expression profile over time. This effect occures over the other time points (from 0 to 180).

If I had picked the strainmut.minute60  then we would have seen a list of DE genes from 0 to 60 minutes, right?

 

 

Just for clarification- here are the options of which one can choose:

resultsNames(ddsTC)

##  [1] "Intercept"           "strain_mut_vs_wt"    "minute_15_vs_0"      "minute_30_vs_0"     
##  [5] "minute_60_vs_0"      "minute_120_vs_0"     "minute_180_vs_0"     "strainmut.minute15" 
##  [9] "strainmut.minute30"  "strainmut.minute60"  "strainmut.minute120" "strainmut.minute180"

Question 2)

Imagine a gene that behaves in the same way in all timepoints (between the two groups) but time point 0 to 15.

Will this be detected by deseq?

 

 

Cheers,

john

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by john40
4
gravatar for Michael Love
2.7 years ago by
Michael Love14k
United States
Michael Love14k wrote:

"If I had picked the strainmut.minute60  then we would have seen a list of DE genes from 0 to 60 minutes, right?"

This is a model with interaction terms, and this is an interaction term. .You can tell it is an interaction term because it contains the names of both variables strain and minute. So this term is a test for if the mut vs WT fold change is different at minute 60 than at minute 0. 

To generate the tables of log fold change of 60 minutes vs 0 minutes for the WT strain would be:

results(dds, name="minute_60_vs_0")

To generate the tables of log fold change of 60 minutes vs 0 minutes for the mut strain would be the sum of the WT term above and the interaction term which is an additional effect beyond the effect for the reference level (WT):

results(dds, contrast=list(c("minute_60_vs_0","strainmut.minute60"))

"Imagine a gene that behaves in the same way in all timepoints (between the two groups) but time point 0 to 15. Will this be detected by deseq?"

This is the description of an interaction term in the model. You can find this by testing the interaction term strainmut.minute15:

results(dds, name="strainmut.minute15")
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Michael Love14k
0
gravatar for john
2.7 years ago by
john40
Germany
john40 wrote:

Hello Michael,

You said "So this term is a test for if the mut vs WT fold change is different at minute 60 than at minute 0. "

Alright. A gene that has a low pvalue in strainmut.minute60 AND 30 AND 120 AND 180 is then to be considered DE in all time points (compared to 0) between WT and MUT.

----------------

You said:

"To generate the tables of log fold change of 60 minutes vs 0 minutes for the WT strain would be:

results(dds, name="minute_60_vs_0")"

So this shows me genes that are DE between two time points in one group (WT). Not considering the MUT at all.

---------------

You said: "reference level (WT):"

Why is WT our reference? Where did we specify that WT is the reference and not MUT? Why this way?

 

-------------

 

By calling 

results(dds, name="strainmut.minute180")

Deseq finds this gene because it takes 0minutes and 180minutes into account. That's alright.

Is there a way to take all time points into account? In order to find a gene that is DE in all time points between two groups?

 

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by john40
1

See the DESeq2 vignette for details on how to tell R which level is the reference level for a factor variable.

In this case the reference level was set during object construction which can be followed at the fission package vignette.

"Is there a way to take all time points into account? In order to find a gene that is DE in all time points between two groups?

If you want to find genes which have a significant mut vs wt fold change for each and every time point, I don't think there is a simple way to form this results table. You would have to test the fold change for mut vs WT at minute 0:

results(dds, name="strain_mut_vs_wt")

...and then test the fold change for mut vs wt at every other time point:

results(dds, contrast=list(c("strain_mut_vs_wt","strainmut.minute15")))

etc.

And you would then look for genes which have a small adjusted p-value in all of the results tables.

ADD REPLYlink written 2.7 years ago by Michael Love14k

So if we do

results(dds, contrast=list(c("strain_mut_vs_wt","strainmut.minute15")))

and then

results(dds, contrast=list(c("strain_mut_vs_wt","strainmut.minute30")))

and on and on and then just select the genes that have an adjusted pvalue less than some threshold for all these tests, would this give the correct significance? I mean all of these tests share strainmut.minute0 as the control so they are correlated. How can we model this correlation? Instead, is there a way to test for a composite hypothesis such as  \beta_1 >0 AND \beta_2 >0 where \beta_1 and \beta_2 are two different fold change effects with the same baseline?

ADD REPLYlink modified 8 months ago • written 8 months ago by ysdel30

This isn't a concern in linear models, think about the coefficient table that lm() returns for example when there are 3 or more factor levels. The assumption is that the data is not correlated conditioned on the design matrix X.

No, we don't have a way to test such composite alternative hypotheses.

ADD REPLYlink written 8 months ago by Michael Love14k

I am interested in genes that are DE at each time point (i.e. ALL time points) with respect to 0. I ran some simulations for the lm model at http://rpubs.com/ysdel/corpvals and it seems that the tests are correlated. I don't know how to think about this correctly to find the significance levels.

ADD REPLYlink modified 8 months ago • written 8 months ago by ysdel30

I don't really follow your question, because I don't see where we make the assumption that test statistics from different contrasts would be independent.

Nevertheless, we don't have in DESeq2 a way for you to generate a p-value where the alternate hypothesis is that all time points after t0 have consistent sign of LFC compared to t0.

ADD REPLYlink written 8 months ago by Michael Love14k

I'm sorry, I wasn't clear in what the problem is. Suppose I have results res1, res2, res3 from correlated tests. And I am interested in genes that satisfy the alternative hypothesis for all of these tests. So I will have to do

hits = (res1$padj<0.05) & (res2$padj<0.05) & (res3$padj<0.05)

[or, forgetting about multiple testing adjustments,

hits = (res1$p.value<0.05) & (res2$p.value<0.05) & (res3$p.value<0.05) ]

What is the statistical significance (in terms of FDR or Type I errors) of these hits? If I had a simple hypothesis, I know for (res1$padj<0.05) I will make a type I error 5% of the time.

I very much appreciate you taking the time to answer these questions.

ADD REPLYlink written 8 months ago by ysdel30

You're correct that that statement doesn't have the same control as the tests we provide. We don't have functionality for this kind of combination in DESeq2. You could use pvalue combination methods.

ADD REPLYlink written 8 months ago by Michael Love14k

Hi Michael,

Just a quick question on the answer you gave for "Is there a way to take all time points into account? In order to find a gene that is DE in all time points between two groups?" 

I followed your tutorial on the fission data for time series experiments, and i thought that taking a summary of results > summary(resTC) after running the steps below would give an indication of the total genes differentially expressed across all time-points under consideration:

ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ strain + minute)
resTC <- results(ddsTC)
resTC$symbol <- mcols(ddsTC)$symbol
head(resTC[order(resTC$pvalue),],4)

Kindly clarify this part for me, because these are the same differentially expressed genes you then went on to use for construction of the heatmap

topGenes <- head(order(resTC$pvalue),40)

So which time-point are these representing??

Thanks,

Stan

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by stan0
1

The likelihood ratio test here is testing genes for condition-specific differences over time after time 0. Here's the relevant section from the workflow:

"The following chunk performs a likelihood ratio test, where we remove the strain-specific differences over time. Genes with small p values from this test are those which, at one or more time points after time 0 showed a strain-specific effect. Note therefore that this will not give small p values to genes which moved up or down over time in the same way in both strains."

 

 

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Michael Love14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 144 users visited in the last hour