Question about using LRT test for time course experiments with DESeq2
1
0
Entering edit mode
Denise • 0
@860ad694
Last seen 10 hours ago
Spain

Hi,

I have an experimental design of treated versus control samples across 4 time points (A, B, C, and D) and I would like to use the LRT test to find compound-specific DE genes across these time points. However, I do not have a time point 0, meaning all 4 time points, consist of treated and control samples.

How can I use the LRT test to find DEGs for all time points including timepoint A?

Thank you, Denise

DESeq2 • 831 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

If you use the time series example in the workflow (rnaseqGene), you will get an effect for time point A and all the others. You would then use a LRT of ~condition + time + condition:time compared to a reduced design of ~condition.

ADD COMMENT
0
Entering edit mode

Hi Michael,

Thank you for getting back to me!

I am unsure if my question is very clear as I am new to this. I would like to use an LRT of compound + time + compound:time versus compound + time to capture compound-specific genes across time. I get the following coefficients

Intercept, compound_vs_CONTROL, timepoint_B_vs_A, timepoint_C_vs_A, timepoint_D_vs_A, compound.timepointB, compound.timepointC, and compound.timepointD

So there is no result for compound:timepointA! How can I get a coefficient/genes for this combination as well?

Thank you.

ADD REPLY
0
Entering edit mode

I understood your question above (at least I think so) and my response above remains relevant.

How can I get a coefficient/genes for this combination as well?

I recommend to walk through the question and my response above with a local statistician or someone familiar with linear models in R.

ADD REPLY
0
Entering edit mode

Hi,

Unfortunately, I don't know anyone who could help me with this particular question.

Would calling the results() function with a contrast of contrast=list(c(compound_vs_CONTROL, compound.timepointB, compound.timepointC, compound.timepointD)) give me the compound-specific genes across all time points?

Thank you.

ADD REPLY
0
Entering edit mode

An LRT has nothing to do with any contrast, but instead is comparing the model fit with and without certain coefficients. So if you do

dds <- DESeq(<deseq object goes here>, "LRT", reduced = ~ compound + time)

Then you are comparing the likelihood for the full model (~ compound + time + compound:time) versus the reduced model (~ compound + time), and if the full model fits the data better, you will get a significant p-value for the likelihood ratio test. In effect what that means is that the time-resolved gene expression is different at one or more time points for at least one of the compounds. This is an omnibus test (similar to an F-test) that tells you things are different without saying what the differences are.

You could come back with Wald tests for each timepoint to nail that down, but that is pretty boring and complex all at the same time. You would probably be better off taking the set of significant genes, using K-means clustering to resolve into multiple patterns and then plot the data for each cluster. You will have to play with K to get reasonable clusters, but that allows you to say things about each cluster, and how the treatments affect the time-dependent resolution of gene expression within each cluster.

Anyway, to show further what I mean about LRT not being contrast-dependent, consider this:

> library(DESeq2)
> z <- makeExampleDESeqDataSet(m = 24)
> colData(z) <- DataFrame(Time = factor(rep(1:4, each = 6)), Treatment = factor(rep(c("A","B"), each = 3, times = 4))) 
> design(z) <- ~ Treatment + Time + Treatment:Time
> dds <- DESeq(z, "LRT", reduced = ~ Treatment + Time)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing              
> resultsNames(dds)
[1] "Intercept"       
[2] "Treatment_B_vs_A"
[3] "Time_2_vs_1"     
[4] "Time_3_vs_1"     
[5] "Time_4_vs_1"     
[6] "TreatmentB.Time2"
[7] "TreatmentB.Time3"
[8] "TreatmentB.Time4"
## let's do a contrast comparing  Time_2_vs_1 vs Time_3_vs_1
## and compare that to the default       
> all.equal(results(dds), results(dds, contrast = c(0,0,1,-1,0,0,0,0)), check.attributes = FALSE)
[1] TRUE

The results with and without a contrast are identical!

Specifying the contrast shouldn't affect the results for an LRT test because it's not a Wald test, so a contrast doesn't make much sense in that context. Which is why I suspect Mike asked you to find somebody local to help you. An alternative, if you need to do the analysis yourself is to educate yourself about linear regression. In the end, you will have to explain what you have done and would presumably want your explanation to make sense, particularly if you are explaining to someone with a statistical background. There's only two ways that can happen - either you know what you are talking about, or you have somebody who does know what she's talking about to provide the explanation. If you can't do the latter, you should probably do the former.

0
Entering edit mode

Hi James,

Thank you very much for your very detailed explanation!

I guess my question is simpler than that, I just want to union, across all time points, of all compound-specific genes. The results() function gives me logfc for the last coefficient, TreatmentB.Time4. I can similarly get TreatmentB.Time3 and TreatmentB.Time2. For time point 1 that is probably Treatment_B_vs_A.

Would the union of all genes with significant LRT p-value and, say, |logfc|>1 for TreatmentB.Time4, TreatmentB.Time3, TreatmentB.Time2 and Treatment_B_vs_A, give me the list of compound-specific genes across time? Why would I want to do a Wald test on top of this?

Thank you.

ADD REPLY
0
Entering edit mode

You don't want to add a logFC on top like that, as it invalidates the p-value.

Maybe I wasn't clear before. The LRT gives you all the genes that change at any time. The fact that the results DataFrame only has one logFC listed is because there's only one column and you have to put something in there. But the p-values aren't in reference to that one comparison. They are in reference to the LRT, which compares models, not individual coefficients.

ADD REPLY

Login before adding your answer.

Traffic: 556 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6