Dear friends, I had 6(A,B,C,D,E,F) groups of animals, each containing 7 samples. I run DESeq and compared the DEGs between every two groups.
Later, I found group F is biologically far away from other groups (all 6 groups were in the same batch). So, I run DESeq again, and drop the idx in group F. I calculated the DEGs again and found found that the new DEG table is different from previous ones. For example, for A vs B, I found 373 DEGs (padj < 0.1) when I include all 6 groups. When I remove group F, only 239 DEGs (padj<0.1), were identified.
The question is, which method should I use in this case.
My code:
Get the DEGs between group A and B, where dds_6groups contains all samples
res_AB_6groups <- results(dds_6groups,contrast=c("groups","A","B")) res_1_rmNA <- res_AB_6groups[! is.na(res_AB_6groups$padj),] res_1_p10<-res_1_rmNA[(res_1_rmNA$padj<0.1),] rownames(res_1_p10) #373 DEGs identified
recalculate dds object, removing group F
dds_5groups <- dds_6groups[,dds_6groups$groups %in% c("F")] dds_5groups$groups <- droplevels(dds_5groups$groups) dds_5groups <- DESeq(dds_5groups)
Get the DEGs between group A and B from new dds object
res_AB_5groups <- results(dds_5groups,contrast=c("groups","A","B")) res_2_rmNA <- res_AB_5groups[! is.na(res_AB_5groups$padj),] res_2_p10<-res_2_rmNA[(res_2_rmNA$padj<0.1),] rownames(res_2_p10) #239 DEGs identified
Yes. To add to Ryan’s answer, this question is asked often and is a FAQ in our vignette. We show how to detect this via PCA and recommend excluding groups with different spread as seen in the PCA plot. Check the DESeq2 FAQ section of the vignette.