Hi everyone,
I am working on a RNASeq dataset, which has four genotypes (WT, mutant1, mutant2 and mutant3) and 5 developmental stages (T1,T2,T3,T4,T5). For each sample, I have at least two biological replicates.
I want to get the differentially expressed genes between different genotypes (mutant compared to WT) at any developmental stages, for example: mutant3 compared to WT at T2 stage. I also want to get the differentially expressed genes between two developmental stages for each genotype, for example: T3 compared to T2 in mutant2.
I am trying to use the LRT in DESeq2, I am still very confused with the model design, although I have read a lot of manuals, paper and posts here. I believe that the correct design for my experiment is ~genotype+time+ genotype: time in the full and ~genotype+time in the reduced formula. So my question is: whether I want to get DE genes at two specific development stages within a genotype or get DE genes between two different genotypes(mutant compared to WT) at single development stage, what I need to do is just use the full and reduced model mentioned above, and use the name or contrast argument to extract what I need from the LRT results table? Am I right?
If I am interested in the effects of the genotype, the reduced model should be “~time”, is this correct?
At the same time, what the LRT will test when the reduced formula is “~ genotype+time” or “~genotype”, the effect of interaction of “genotype:time” and the effect of “time”?
Thank you in advance for your help.
I appreciate your reply.
Bob
Hi Michael,
Thanks for your reply. I am not a statistic person, would you please give me more detailed explanation if you have got time?
What you mentioned above is suggesting using the Wald test for DESeq(dds) not using LRT?
What I get from results(dds, contrast=c("group","mutant3T2","WTT2")) is the differentially expressed genes between mutant3 and WT at T2 stage? Is it the same with the pair comparison in DESeq2 using wald test?
The reason why I want to use LRT is I want to get the genes which are only affected by genotype (genotype specific) or the genes which are only affected by developmental stages (development sepecific). I am not sure if I understand LRT in the right way.
My second purpose is to get DE genes between two different genotypes(example: mutant1 vs WT) at single development stage
Third purpose is to get DE genes at two specific development stages within a genotype.
I am really struggling with the interpretation of the reduced formula in LRT. I appreciate your patience and help.
Thank you very much.
"What I get from results(dds, contrast=c("group","mutant3T2","WTT2")) is the differentially expressed genes between mutant3 and WT at T2 stage?"
Yes, this is how you can construct Wald tests for the specific comparisons you requested which I described above.
You could also use the same style to make a results table for comparing time points within mutant 3 for example.
"The reason why I want to use LRT is I want to get the genes which are only affected by genotype (genotype specific) or the genes which are only affected by developmental stages (development sepecific)."
These kind of results tables (genes only affected by genotype, genes only affected by stage) are constructed using the code I described above.
When you test mut3 vs WT within a stage, you are isolating the genotype effect at that stage. Similarly, when you compare stages within a genotype, you are isolating the stage effect.
The difference in DESeq2 between the Wald test and the LRT is that we recommend Wald tests to contrast or test individual coefficients, while the LRT can be used to test many coefficients at once. Read the section in the DESeq2 vignette on "Likelihood ratio tests".
You can also see this example of DESeq2 for time series using LRT.
If the statistical concepts are still confusing, I recommend consulting a local statistician who can explain the differences in person. While the forum is useful for exchanging code, it is not the best technology for teaching statistics in my opinion (where a whiteboard and a face-to-face conversation is much better).
Thank you very much Michael for your comprehensive answer and all the good suggestions.