Question

DESeq2 independent filtering fails to filter low counts

0

Entering edit mode

nschaum • 0

@nschaum-10717

Last seen 7.9 years ago

I ran DESeq2 version 2.12.2 with the following code:

Meta<-read.csv("BAT_Meta.csv",header=TRUE,row.names = 1)
Raw<-read.csv("BAT_Raw.csv",header=TRUE,row.names = 1)
dds<-DESeqDataSetFromMatrix(countData = Raw,colData = Meta,design = ~condition)
dds$condition<-factor(dds$condition, levels=c("old","young"))
dds<-DESeq(dds)
res<-results(dds)
resOrdered<-res[order(res$padj),]
summary(res)
write.csv(as.data.frame(resOrdered),file="BAT_DGE.csv")

summary(res) gives the following:

out of 21981 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 0, 0% 
LFC < 0 (down)   : 0, 0% 
outliers [1]     : 2247, 10% 
low counts [2]   : 0, 0% 
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Note that I expect a large number of outliers because this was only sequenced to 3M reads/sample. When I view the baseMean column of res, it appears genes with low mean counts are not assigned NA as they should be. I've run the same code on 16 datasets, and 3 of them have this problem. I got about 19.5k genes assigned padj in this dataset, whereas in a dataset that works, only about 12k genes were assigned a padj. Here is the summary(res) of a working dataset:

out of 22097 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 352, 1.6% 
LFC < 0 (down)   : 127, 0.57% 
outliers [1]     : 2102, 9.5% 
low counts [2]   : 7990, 36% 
(mean count < 4)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

I've even copied the counts from the dataset that doesn't work over the counts of a dataset that does work, keeping everything else constant, and I still get this problem. It appears that there is something about the counts data itself which prevents independent filtering. If I pre-filter with the code:

dds <- dds[ rowSums(counts(dds)) > 1, ]

summary(res) gives:

out of 21094 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 0, 0% 
LFC < 0 (down)   : 0, 0% 
outliers [1]     : 2246, 11% 
low counts [2]   : 0, 0% 
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

I've read through the DESeq2 vignette and have searched for others with this problem, but no luck. Any ideas?

deseq2 • 1.8k views

ADD COMMENT • link updated 7.9 years ago by Michael Love 41k • written 7.9 years ago by nschaum • 0

score 1 · Answer 1 · 2016-05-18

hi,

A comment to this "genes with low mean counts are not assigned NA as they should be": genes with low mean counts are only filtered *IF* such filtering would increase the number of rejections. It could be that in your datasets (the ones in which you don't see automatic filtering kicking in) there are no significant genes regardless of mean filtering -- filtering or not filtering doesn't make a difference. So my guess would be it's a matter of the inherent amount of DE (and power to detect given you sample size) in the different datasets. You could make PCA plots to see whether there are clear differences in the separation across the different datasets and whether that accords with my guess.