Hi,
I have an experimental design of treated versus control samples across 4 time points (A, B, C, and D) and I would like to use the LRT test to find compound-specific DE genes across these time points. However, I do not have a time point 0, meaning all 4 time points, consist of treated and control samples.
How can I use the LRT test to find DEGs for all time points including timepoint A?
Thank you, Denise
Hi Michael,
Thank you for getting back to me!
I am unsure if my question is very clear as I am new to this. I would like to use an LRT of compound + time + compound:time versus compound + time to capture compound-specific genes across time. I get the following coefficients
Intercept, compound_vs_CONTROL, timepoint_B_vs_A, timepoint_C_vs_A, timepoint_D_vs_A, compound.timepointB, compound.timepointC, and compound.timepointD
So there is no result for compound:timepointA! How can I get a coefficient/genes for this combination as well?
Thank you.
I understood your question above (at least I think so) and my response above remains relevant.
I recommend to walk through the question and my response above with a local statistician or someone familiar with linear models in R.
Hi,
Unfortunately, I don't know anyone who could help me with this particular question.
Would calling the results() function with a contrast of contrast=list(c(compound_vs_CONTROL, compound.timepointB, compound.timepointC, compound.timepointD)) give me the compound-specific genes across all time points?
Thank you.
An LRT has nothing to do with any contrast, but instead is comparing the model fit with and without certain coefficients. So if you do
Then you are comparing the likelihood for the full model (~ compound + time + compound:time) versus the reduced model (~ compound + time), and if the full model fits the data better, you will get a significant p-value for the likelihood ratio test. In effect what that means is that the time-resolved gene expression is different at one or more time points for at least one of the compounds. This is an omnibus test (similar to an F-test) that tells you things are different without saying what the differences are.
You could come back with Wald tests for each timepoint to nail that down, but that is pretty boring and complex all at the same time. You would probably be better off taking the set of significant genes, using K-means clustering to resolve into multiple patterns and then plot the data for each cluster. You will have to play with K to get reasonable clusters, but that allows you to say things about each cluster, and how the treatments affect the time-dependent resolution of gene expression within each cluster.
Anyway, to show further what I mean about LRT not being contrast-dependent, consider this:
The results with and without a contrast are identical!
Specifying the contrast shouldn't affect the results for an LRT test because it's not a Wald test, so a contrast doesn't make much sense in that context. Which is why I suspect Mike asked you to find somebody local to help you. An alternative, if you need to do the analysis yourself is to educate yourself about linear regression. In the end, you will have to explain what you have done and would presumably want your explanation to make sense, particularly if you are explaining to someone with a statistical background. There's only two ways that can happen - either you know what you are talking about, or you have somebody who does know what she's talking about to provide the explanation. If you can't do the latter, you should probably do the former.
Hi James,
Thank you very much for your very detailed explanation!
I guess my question is simpler than that, I just want to union, across all time points, of all compound-specific genes. The results() function gives me logfc for the last coefficient, TreatmentB.Time4. I can similarly get TreatmentB.Time3 and TreatmentB.Time2. For time point 1 that is probably Treatment_B_vs_A.
Would the union of all genes with significant LRT p-value and, say, |logfc|>1 for TreatmentB.Time4, TreatmentB.Time3, TreatmentB.Time2 and Treatment_B_vs_A, give me the list of compound-specific genes across time? Why would I want to do a Wald test on top of this?
Thank you.
You don't want to add a logFC on top like that, as it invalidates the p-value.
Maybe I wasn't clear before. The LRT gives you all the genes that change at any time. The fact that the
results
DataFrame
only has one logFC listed is because there's only one column and you have to put something in there. But the p-values aren't in reference to that one comparison. They are in reference to the LRT, which compares models, not individual coefficients.Hi James,
Yes, I understand that the p-values are with respect to the full versus reduced model comparison.
However, the "genes that change at any time " could be genes that change in the control samples and have nothing to do with the treatment. I do not want to keep those genes, this is why I thought I could use the logfc of the coefficients above to filter them out. Is this the wrong way to do it?
Thank you very much.
You are testing the interaction, so you are asking for those genes that change at any time differently, depending on the treatment. If controls change, but treated don't, that's interesting, no?