Analyzing a subgroups of samples, which result should I trust
2
0
Entering edit mode
Raymond ▴ 20
@raymond-14020
Last seen 5.5 years ago

Dear friends, I had 6(A,B,C,D,E,F) groups of animals, each containing 7 samples. I run DESeq and compared the DEGs between every two groups.

Later, I found group F is biologically far away from other groups (all 6 groups were in the same batch). So, I run DESeq again, and drop the idx in group F. I calculated the DEGs again and found found that the new DEG table is different from previous ones. For example, for A vs B, I found 373 DEGs (padj < 0.1) when I include all 6 groups. When I remove group F, only 239 DEGs (padj<0.1), were identified.

The question is, which method should I use in this case.

My code:

Get the DEGs between group A and B, where dds_6groups contains all samples

res_AB_6groups <- results(dds_6groups,contrast=c("groups","A","B"))

res_1_rmNA <- res_AB_6groups[! is.na(res_AB_6groups$padj),]

res_1_p10<-res_1_rmNA[(res_1_rmNA$padj<0.1),]

rownames(res_1_p10)

#373 DEGs identified

recalculate dds object, removing group F

dds_5groups <- dds_6groups[,dds_6groups$groups %in% c("F")]

dds_5groups$groups <- droplevels(dds_5groups$groups)

dds_5groups <- DESeq(dds_5groups)

Get the DEGs between group A and B from new dds object

res_AB_5groups <- results(dds_5groups,contrast=c("groups","A","B"))

res_2_rmNA <- res_AB_5groups[! is.na(res_AB_5groups$padj),]

res_2_p10<-res_2_rmNA[(res_2_rmNA$padj<0.1),]

rownames(res_2_p10) 

#239 DEGs identified
deseq2 rna-seq • 1.2k views
ADD COMMENT
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 9 weeks ago
Icahn School of Medicine at Mount Sinai…

Remember that gene dispersions are estimated using all samples, which means removing a group from the analysis will affect the dispersion estimation. It appears that when group F was included, the estimated gene dispersions were probably smaller on average, leading to more significant p-values. As for which set of results to trust, you have to decide whether it makes sense to include group F in the dispersion estimation step. I don't see any reason why group F should not be included. Just because there are large differences between group F and other groups, that doesn't mean that the intra-group variance is different enough to justify excluding it. So unless you have a specific reason not to, I would include all the samples from all groups in your analysis.

If you are really concerned with different groups having different dispersions, you could experiment with limma's voomWithQualityWeights function, to see if the estimated weights seem dependent on group membership or not.

ADD COMMENT
0
Entering edit mode

Yes. To add to Ryan’s answer, this question is asked often and is a FAQ in our vignette. We show how to detect this via PCA and recommend excluding groups with different spread as seen in the PCA plot. Check the DESeq2 FAQ section of the vignette.

ADD REPLY

Login before adding your answer.

Traffic: 1220 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6