I'm trying to analysis a time-series RNA-seq data using DESeq2, I have 15 time points, 2 conditions, and 2 biological repeat.
I perform LRT test first and the full model is ~batch + condition + time + condition:time, while the reduce model is ~batch + condition + time. All variances are recognized as factor. I choose the threshold fdr < 0.01 and get ~5000 different expressed genes.
Then I want to map these different expressed genes at each time point and perform wald test of these different expressed genes at each time point. I use the threshold fdr < 0.01 and | log2FoldChange | > 1 but only ~2000 genes could be mapped in the time points. Then I loose the delimitation and use the threshold only fdr < 0.01, there are ~3000 genes could be mappped in specific time points.
I want to make sure whether my method of process time-series data is right. If so, what is the biological meaning of these genes could not mapped in specific time points.
The LRT is looking for any differences at any time point and so has higher power compared to each individual time point comparison.
Below is a small example just using R's lm() function.
Notice that, while in fact all of the 10 groups have a different mean, none have a p-value < 0.0004. Meanwhile, the ANOVA has a p-value < 0.0004. So this difference in power means that you can't necessarily find which particular time point is different despite detecting it with the LRT.
Thanks for your rapid and gentle reply, it helps me a lot.
Could I understand that genes which could be detected by LRT method but not for Wald test at particular time point do have differences at different condition but they are not show different at specific time point so they could be recognized as some kind of 'background' response?
Another question I've meet while dealing with my time course RNASeq data using DESeq2 is the result of DEG is totally different if set time point as a numeric but not factor (LRT method, ~1500 genes in numeric but ~5000 genes in factor) .The case in your tutorial considers time point as factor, but I'm a little confused what DESeq2 did while deal with these two different type of variances?
The way to understand how this happens is that there is more statistical power for the LRT.
Changing in R between a factor and numeric in a design formula is a pretty big difference in modeling decision. I'd recommend speaking with a local statistician to understand the different assumptions behind these.
Hi Michael,
Thanks for your rapid and gentle reply, it helps me a lot.
Could I understand that genes which could be detected by LRT method but not for Wald test at particular time point do have differences at different condition but they are not show different at specific time point so they could be recognized as some kind of 'background' response?
Another question I've meet while dealing with my time course RNASeq data using DESeq2 is the result of DEG is totally different if set time point as a numeric but not factor (LRT method, ~1500 genes in numeric but ~5000 genes in factor) .The case in your tutorial considers time point as factor, but I'm a little confused what DESeq2 did while deal with these two different type of variances?
The like of the Tutorial:http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#time-course-experiments
Many thanks,
Fei
The way to understand how this happens is that there is more statistical power for the LRT.
Changing in R between a factor and numeric in a design formula is a pretty big difference in modeling decision. I'd recommend speaking with a local statistician to understand the different assumptions behind these.
Many thanks for your advice Michael, thanks a lot.