Hi list,
My question is simple: what questions are each of these two tests responding to? (why do i get different results when testing for the same coefficient in the Wald vs LRT test?) The FC is the same, the p-values change because the LRT provides a p-value testing for the null hypothesis that the coefficients that are not in the reduced formula are equal to zero, the Wald test provide p-values testing for the null hypothesis that the coefficients of the full model are equal to zero. But how do you interpret this?
What are the genes that are being shown as significant for each test?
> design(ddsK)=~A*B*C > resRed = DESeq(ddsK, test = "LRT", reduced = formula(~ A*B)) > resRed = results(resRed, name = "C_C2_vs_C1", alpha = 0.05) > summary(resRed) out of 28842 with nonzero total read count adjusted p-value < 0.05 LFC > 0 (up) : 5936, 21% LFC < 0 (down) : 5894, 20% outliers [1] : 81, 0.28% low counts [2] : 3354, 12% (mean count < 1) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results > design(ddsK)=~A*B*C > ddsK=DESeq(ddsK) > resB <- results(ddsK, name = "C_C2_vs_C1", alpha = 0.05) > summary(resB) out of 28842 with nonzero total read count adjusted p-value < 0.05 LFC > 0 (up) : 1142, 4% LFC < 0 (down) : 1088, 3.8% outliers [1] : 81, 0.28% low counts [2] : 5015, 17% (mean count < 3) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results > resRed log2 fold change (MLE): C C2 vs C1 LRT p-value: '~ A * B * C' vs '~ A * B' DataFrame with 29085 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> comp100000_c0 5.394950 0.88129957 0.7296606 25.557624 3.885476e-05 1.842103e-04 comp100000_c1 15.236690 0.83653169 0.6352388 48.670559 6.839876e-10 8.012020e-09 comp100000_c3 7.684048 -0.02531354 0.9073333 23.174532 1.168546e-04 4.968079e-04 comp100002_c0 241.680116 1.27044149 0.3719448 52.028340 1.360758e-10 1.797857e-09 comp100003_c0 11.010333 1.44552180 0.6997581 8.617796 7.139597e-02 1.266642e-01 ... ... ... ... ... ... ... comp99991_c0 1643.4060 0.06606909 0.08068866 39.811182 4.735591e-08 4.018609e-07 comp99992_c0 651.1968 -0.08546246 0.19208340 2.133650 7.111930e-01 7.639641e-01 comp99993_c0 1957.2871 0.08239842 0.13402428 5.188685 2.684794e-01 3.605124e-01 comp99994_c0 1562.8709 -0.12381266 0.14475235 1.837816 7.655570e-01 8.095270e-01 comp99999_c0 170.9605 0.15237421 0.20194552 4.054691 3.986549e-01 4.888804e-01 > resB log2 fold change (MLE): C C2 vs C1 Wald test p-value: C C2 vs C1 DataFrame with 29085 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> comp100000_c0 5.394950 0.88129957 0.7296606 1.20782123 0.2271160328 0.47320324 comp100000_c1 15.236690 0.83653169 0.6352388 1.31687747 0.1878796988 0.42510643 comp100000_c3 7.684048 -0.02531354 0.9073333 -0.02789883 0.9777428379 0.98894584 comp100002_c0 241.680116 1.27044149 0.3719448 3.41567233 0.0006362476 0.01283631 comp100003_c0 11.010333 1.44552180 0.6997581 2.06574505 0.0388525593 0.18057827 ... ... ... ... ... ... ... comp99991_c0 1643.4060 0.06606909 0.08068866 0.8188151 0.4128919 0.6519404 comp99992_c0 651.1968 -0.08546246 0.19208340 -0.4449237 0.6563749 0.8259819 comp99993_c0 1957.2871 0.08239842 0.13402428 0.6148022 0.5386854 0.7499239 comp99994_c0 1562.8709 -0.12381266 0.14475235 -0.8553413 0.3923622 0.6347520 comp99999_c0 170.9605 0.15237421 0.20194552 0.7545313 0.4505303 0.6829395
Hi, thank you for your reply Michael. The explanation you gave me is in statistical terms and I gather that, I have the same problem when I speak with statisticians here, I was hoping for someone to orient me more on a biological interpretation.
I was expecting that the genes with sig pvalue obtained with the LRT test would be the genes that are affected by factor C and that with results I would extract the DEG between C2_vs_C1 (in only those genes affected by factor C).
Equally I thought that fitting the data with the full model (the Wald option) and extracting the DEG between C2_vs_C1 would give the same number of DEGs as in above.
Actually the fold changes are the same which means it is testing for C2_vs_C1, but I don't understand why the number of significant genes is different, or in other words, biologically, what are the genes obtained from the LRT comparison vs the ones obtained with the Wald test?
In any case, Which of the two options is best to test for differences between C2 and C1?
Thank you
You could say translating from biological hypotheses to statistical analyses is a whole profession and an academic discipline. It seems like you have a fairly complex experimental design, and I can't easily tell you what to do on the support forum. I would want to sit down and understand what are A, B, and C what are the relationship between these, what are the assumptions of the experiment, etc. But I don't have much time for this, which is why I recommend finding a local statistician who you would include as a collaborator on the project.