DESeq2 independent filtering fails to filter low counts
1
0
Entering edit mode
nschaum • 0
@nschaum-10717
Last seen 7.9 years ago

I ran DESeq2 version 2.12.2 with the following code:

Meta<-read.csv("BAT_Meta.csv",header=TRUE,row.names = 1)
Raw<-read.csv("BAT_Raw.csv",header=TRUE,row.names = 1)
dds<-DESeqDataSetFromMatrix(countData = Raw,colData = Meta,design = ~condition)
dds$condition<-factor(dds$condition, levels=c("old","young"))
dds<-DESeq(dds)
res<-results(dds)
resOrdered<-res[order(res$padj),]
summary(res)
write.csv(as.data.frame(resOrdered),file="BAT_DGE.csv")

summary(res) gives the following: 

out of 21981 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 0, 0% 
LFC < 0 (down)   : 0, 0% 
outliers [1]     : 2247, 10% 
low counts [2]   : 0, 0% 
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Note that I expect a large number of outliers because this was only sequenced to 3M reads/sample. When I view the baseMean column of res, it appears genes with low mean counts are not assigned NA as they should be. I've run the same code on 16 datasets, and 3 of them have this problem. I got about 19.5k genes assigned padj in this dataset, whereas in a dataset that works, only about 12k genes were assigned a padj. Here is the summary(res) of a working dataset:

out of 22097 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 352, 1.6% 
LFC < 0 (down)   : 127, 0.57% 
outliers [1]     : 2102, 9.5% 
low counts [2]   : 7990, 36% 
(mean count < 4)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

I've even copied the counts from the dataset that doesn't work over the counts of a dataset that does work, keeping everything else constant, and I still get this problem. It appears that there is something about the counts data itself which prevents independent filtering. If I pre-filter with the code:

dds <- dds[ rowSums(counts(dds)) > 1, ] 

summary(res) gives:

out of 21094 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 0, 0% 
LFC < 0 (down)   : 0, 0% 
outliers [1]     : 2246, 11% 
low counts [2]   : 0, 0% 
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

I've read through the DESeq2 vignette and have searched for others with this problem, but no luck. Any ideas?

deseq2 • 1.8k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

hi,

A comment to this "genes with low mean counts are not assigned NA as they should be": genes with low mean counts are only filtered *IF* such filtering would increase the number of rejections. It could be that in your datasets (the ones in which you don't see automatic filtering kicking in) there are no significant genes regardless of mean filtering -- filtering or not filtering doesn't make a difference. So my guess would be it's a matter of the inherent amount of DE (and power to detect given you sample size) in the different datasets. You could make PCA plots to see whether there are clear differences in the separation across the different datasets and whether that accords with my guess.

ADD COMMENT

Login before adding your answer.

Traffic: 551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6