I am hoping to get some guidance on whether I should be choosing LRT vs Wald. I have done both, and do not see too much of a difference in the expression of genes/gene sets but I have also doubted if I am using Wald correctly because of setting the contrasts. Background, I have 5 sample types sorted by their fluorescent color (Red, Green, Red+Green, Red+Green+Blue, Control). I want to see differences in stress pathway expression between all of these colors , so my design = ~color.
dds_no = DESeqDataSetFromMatrix(countData=countData_no, colData=colData_no, design=~color)
The way I have always seen it, this is an all vs all comparison. I have struggled to grasp the all vs all comparison using Wald. From what I have seen on numerous other posts, you do this by setting contrasts?
1] "Intercept" "color_green_vs_control" "color_greenred_vs_control"  "color_red_vs_control" "color_rgb_vs_control"
So does this mean I need to set additional contrasts with the results function, i.e "color_red_vs_green", etc. And if I had previously not done this and went forward with selecting significant genes and creating heatmaps without setting additional contrasts, that I was not truly looking at all vs all situations? My code without other contrasts:
res_no <- results(dds_no) rld_no <- rlogTransformation(dds_no) paths<-assay(rld_no) df_path3 <- cbind(rownames(res_no), data.frame(res_no, row.names=NULL)) topTable3 <- as.data.frame(df_path3) sigGeneList3 <- subset(topTable3, abs(log2FoldChange)>=1 & pvalue<=0.05)[,1] topMatrix3 <- paths3[which(rownames(paths3) %in% sigGeneList3),] topMatrixPATHS3 <- gsva(data.matrix(topMatrix3), stress_Cao_etal_2017, method="gsva", min.sz=1, max.sz=Inf, kcdf="Gaussian", mx.diff=TRUE, verbose=TRUE)
Now I have read LRT can be used to analyze all levels of a factor at once, which sounds great and easy for me. So I tried:
dds_LRT = DESeq(dds_no,test="LRT", reduced=~1) res_LRT <- results(dds_LRT) rld_LRT<- rlogTransformation(dds_LRT) pathsLRT<-assay(rld_LRT) df_pathLRT <- cbind(rownames(res_LRT), data.frame(res_LRT, row.names=NULL)) topTableLRT <- as.data.frame(df_pathLRT) sigGeneListLRT <- subset(topTableLRT, padj<=0.05)[,1] topMatrixLRT <- pathsLRT[which(rownames(pathsLRT) %in% sigGeneListLRT),] topMatrixPATHSLRT <- gsva(data.matrix(topMatrixLRT), stress_Cao_etal_2017, method="gsva", min.sz=1, max.sz=Inf, kcdf="Gaussian", mx.diff=TRUE, verbose=TRUE)
But now I am uncertain if LRT is right for me based on this quote from another post. "If you want to test comparisons of pairs of levels of disease, you should be using the Wald test (the standard DESeq() workflow), instead of the LRT. The LRT is for testing all levels of disease at once."
Long story short, should I use Wald or LRT?
Thank you very much in advance! Morgan