DESeq2, what to do when controls are 7 times more abundant than treatments
Entering edit mode
spollenw • 0
Last seen 6 weeks ago
United States

I am using DESeq2 to test for differential expression of miRNA reads. The 10 control samples are about 5 to 60 million reads. The 5 treatment samples are about 1 million to 5 million reads. The normalization run by DESeq2 adjusts these by creating size factors to scale the counts. The overabundance of miRNAs that are found to be down-regulated (LFC <0) (with a padj < 0.05) suggest there simply are not enough reads in the treatment input for the more rare miRNAs. To address this problem I have done two things: (1) I restrict the control samples to those that are 10m million reads or fewer leaving 7 control samples, but the ratio of the median read depths for the two groups is still about 4 to 1, and, (2) I am filtering out the miRNAs that are not expressed adequately in the treatments, although this could preclude finding miRNAs that were severely down-regulated by treatment. Alternatives include subsampling the controls to create read counts that are comparable to those of the treatments, but I have seen posts that indicate this is not a good approach (not sure I understand why). Can anyone suggest a better approach to this? It may well be that more rare miRNAs are not that important, but I like to be thorough.

Thanks, Bill

DESeq2 • 116 views
Entering edit mode
Last seen 3 hours ago
United States

I generally recommend your option 2 here: filtering out genes unless they are minimally expressed in the lowly sequenced condition. This is pretty much all you can do when sequencing depth is perfectly confounded with the condition of interest.


Login before adding your answer.

Traffic: 246 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6