Question

Question about using LRT test for time course experiments with DESeq2

0

Entering edit mode

Denise • 0

@860ad694

Last seen 5 weeks ago

Spain

Hi,

I have an experimental design of treated versus control samples across 4 time points (A, B, C, and D) and I would like to use the LRT test to find compound-specific DE genes across these time points. However, I do not have a time point 0, meaning all 4 time points, consist of treated and control samples.

How can I use the LRT test to find DEGs for all time points including timepoint A?

Thank you, Denise

DESeq2 • 2.6k views

ADD COMMENT • link updated 9 months ago by Rob • 0 • written 14 months ago by Denise • 0

score 0 · Answer 1 · 2024-10-29

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 9 days ago

United States

If you use the time series example in the workflow (rnaseqGene), you will get an effect for time point A and all the others. You would then use a LRT of ~condition + time + condition:time compared to a reduced design of ~condition.

ADD COMMENT • link 14 months ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Thank you for getting back to me!

I am unsure if my question is very clear as I am new to this. I would like to use an LRT of compound + time + compound:time versus compound + time to capture compound-specific genes across time. I get the following coefficients

Intercept, compound_vs_CONTROL, timepoint_B_vs_A, timepoint_C_vs_A, timepoint_D_vs_A, compound.timepointB, compound.timepointC, and compound.timepointD

So there is no result for compound:timepointA! How can I get a coefficient/genes for this combination as well?

Thank you.

ADD REPLY • link 14 months ago Denise • 0

0

Entering edit mode

I understood your question above (at least I think so) and my response above remains relevant.

How can I get a coefficient/genes for this combination as well?

I recommend to walk through the question and my response above with a local statistician or someone familiar with linear models in R.

ADD REPLY • link 14 months ago Michael Love 43k

0

Entering edit mode

Hi,

Unfortunately, I don't know anyone who could help me with this particular question.

Would calling the results() function with a contrast of contrast=list(c(compound_vs_CONTROL, compound.timepointB, compound.timepointC, compound.timepointD)) give me the compound-specific genes across all time points?

Thank you.

ADD REPLY • link 14 months ago Denise • 0

0

Entering edit mode

An LRT has nothing to do with any contrast, but instead is comparing the model fit with and without certain coefficients. So if you do

dds <- DESeq(<deseq object goes here>, "LRT", reduced = ~ compound + time)

Then you are comparing the likelihood for the full model (~ compound + time + compound:time) versus the reduced model (~ compound + time), and if the full model fits the data better, you will get a significant p-value for the likelihood ratio test. In effect what that means is that the time-resolved gene expression is different at one or more time points for at least one of the compounds. This is an omnibus test (similar to an F-test) that tells you things are different without saying what the differences are.

You could come back with Wald tests for each timepoint to nail that down, but that is pretty boring and complex all at the same time. You would probably be better off taking the set of significant genes, using K-means clustering to resolve into multiple patterns and then plot the data for each cluster. You will have to play with K to get reasonable clusters, but that allows you to say things about each cluster, and how the treatments affect the time-dependent resolution of gene expression within each cluster.

Anyway, to show further what I mean about LRT not being contrast-dependent, consider this:

> library(DESeq2)
> z <- makeExampleDESeqDataSet(m = 24)
> colData(z) <- DataFrame(Time = factor(rep(1:4, each = 6)), Treatment = factor(rep(c("A","B"), each = 3, times = 4))) 
> design(z) <- ~ Treatment + Time + Treatment:Time
> dds <- DESeq(z, "LRT", reduced = ~ Treatment + Time)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing              
> resultsNames(dds)
[1] "Intercept"       
[2] "Treatment_B_vs_A"
[3] "Time_2_vs_1"     
[4] "Time_3_vs_1"     
[5] "Time_4_vs_1"     
[6] "TreatmentB.Time2"
[7] "TreatmentB.Time3"
[8] "TreatmentB.Time4"
## let's do a contrast comparing  Time_2_vs_1 vs Time_3_vs_1
## and compare that to the default       
> all.equal(results(dds), results(dds, contrast = c(0,0,1,-1,0,0,0,0)), check.attributes = FALSE)
[1] TRUE

The results with and without a contrast are identical!

Specifying the contrast shouldn't affect the results for an LRT test because it's not a Wald test, so a contrast doesn't make much sense in that context. Which is why I suspect Mike asked you to find somebody local to help you. An alternative, if you need to do the analysis yourself is to educate yourself about linear regression. In the end, you will have to explain what you have done and would presumably want your explanation to make sense, particularly if you are explaining to someone with a statistical background. There's only two ways that can happen - either you know what you are talking about, or you have somebody who does know what she's talking about to provide the explanation. If you can't do the latter, you should probably do the former.

ADD REPLY • link 14 months ago James W. MacDonald 68k

0

Entering edit mode

Hi James,

Thank you very much for your very detailed explanation!

I guess my question is simpler than that, I just want to union, across all time points, of all compound-specific genes. The results() function gives me logfc for the last coefficient, TreatmentB.Time4. I can similarly get TreatmentB.Time3 and TreatmentB.Time2. For time point 1 that is probably Treatment_B_vs_A.

Would the union of all genes with significant LRT p-value and, say, |logfc|>1 for TreatmentB.Time4, TreatmentB.Time3, TreatmentB.Time2 and Treatment_B_vs_A, give me the list of compound-specific genes across time? Why would I want to do a Wald test on top of this?

Thank you.

ADD REPLY • link 14 months ago Denise • 0

0

Entering edit mode

You don't want to add a logFC on top like that, as it invalidates the p-value.

Maybe I wasn't clear before. The LRT gives you all the genes that change at any time. The fact that the results DataFrame only has one logFC listed is because there's only one column and you have to put something in there. But the p-values aren't in reference to that one comparison. They are in reference to the LRT, which compares models, not individual coefficients.

ADD REPLY • link 14 months ago James W. MacDonald 68k

0

Entering edit mode

Hi James,

Yes, I understand that the p-values are with respect to the full versus reduced model comparison.

However, the "genes that change at any time " could be genes that change in the control samples and have nothing to do with the treatment. I do not want to keep those genes, this is why I thought I could use the logfc of the coefficients above to filter them out. Is this the wrong way to do it?

Thank you very much.

ADD REPLY • link 14 months ago Denise • 0

0

Entering edit mode

You are testing the interaction, so you are asking for those genes that change at any time differently, depending on the treatment. If controls change, but treated don't, that's interesting, no?

ADD REPLY • link 14 months ago James W. MacDonald 68k

0

Entering edit mode

Not really, we are testing some toxic compounds to identify genes that are activated in response to the toxic exposure. Genes that change only in control samples aren't part of this response, so including them would only add unnecessary confusion.

ADD REPLY • link 14 months ago Denise • 0

0

Entering edit mode

Just to pick up on this thread as its really helpful for understanding the LRT test. The LRT test is given any genes that change within the particular arm of the experiment over time? i.e if one was to have an experiment with a Control arm and a Treatment arm that is measured at 4 timepoints - an LRT would be able to tell you any genes that changes over time in the control group or the treatment group. Then K means clustering would be able to give some appreciation of these changes were (groups of genes that were downregulated over time or upregulated over timepoints?). However if i understood it correctly the LRT cannot compare the genes that change over time between the two conditions (treatment and control) and as alluded to - that would require a pairwise wald test to do this on a timepoint per timepoint bases.

If this is my correct understanding - Is a reduced model essentially one where you are only factoring by Time? So full model would be time and ondition (control/treatment) and reduce would be time only?

Finally -- could you link to the rnaseqGene workflow or is it just this: https://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

Thanks in advance. Apologies for frustrating questions

ADD REPLY • link 9 months ago Rob • 0