I am performing an analysis for diff. expression on insects. The insects are treated with two doses of an insecticide (let's call them "low" and "high").
The setting of the experiment is the following:
- 5 x Control (biological replicates, several individuals pooled per sample)
- 5 x exposed to low dosage (biological replicates, several individuals pooled per sample)
- 5 x exposed to high dosage (biological replicates, several individuals pooled per sample)
I compare once control against "low" and once control against "high".
The analysis is based on a reference genome, since there is a high. qual. genome & annotation available. I mapped with HISAT2, estimated with stringtie and imported with tximport into DESeq2.
For the low concentration I see barely any effect (5 genes DE on padj < 0.05). For the high concentration I have 71 DE genes on padj < 0.05.
BUT, if I have a look on the summary of the two tests, I see that for the low concentration I have zero "low counts". For the high concentration I have 4079 "low counts" (see below). I do not really understand this. Why is this so extreme? Is there a problem with my analysis or is this common?
Summary for "low":
out of 11072 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 3, 0.027%
LFC < 0 (down) : 2, 0.018%
outliers [1] : 12, 0.11%
low counts [2] : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
Summary for "high":
out of 11147 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 39, 0.35%
LFC < 0 (down) : 32, 0.29%
outliers [1] : 11, 0.099%
low counts [2] : 4079, 37%
(mean count < 150)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
EDIT:
The two "Summaries" above are resulting from DESeq's summary
-function. The low counts are referring to the ones filtered by independentFiltering
(see https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#indfilttheory).
Thank you for the answer. I will have a more detailed look into the vignette and the paper. Already did that but obviously did not get it fully.
Please allow me a short question: Are you surprised that there are no genes filtered by “independent filtering” when there seems to be little to no effect with the "low dosage" but more than 4000genes with the "high dosage" where we see many DE genes?
No not surprised or concerned. The filtering depends on the distribution of small pvalues over the filter statistic. This is explained in the two papers (independent filtering and IHW).
Great, thanks a lot!