Question: DESeq2 independent filtering fails to filter low counts
0
gravatar for nschaum
2.9 years ago by
nschaum0
nschaum0 wrote:

I ran DESeq2 version 2.12.2 with the following code:

Meta<-read.csv("BAT_Meta.csv",header=TRUE,row.names = 1)
Raw<-read.csv("BAT_Raw.csv",header=TRUE,row.names = 1)
dds<-DESeqDataSetFromMatrix(countData = Raw,colData = Meta,design = ~condition)
dds$condition<-factor(dds$condition, levels=c("old","young"))
dds<-DESeq(dds)
res<-results(dds)
resOrdered<-res[order(res$padj),]
summary(res)
write.csv(as.data.frame(resOrdered),file="BAT_DGE.csv")

summary(res) gives the following: 

out of 21981 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 0, 0% 
LFC < 0 (down)   : 0, 0% 
outliers [1]     : 2247, 10% 
low counts [2]   : 0, 0% 
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Note that I expect a large number of outliers because this was only sequenced to 3M reads/sample. When I view the baseMean column of res, it appears genes with low mean counts are not assigned NA as they should be. I've run the same code on 16 datasets, and 3 of them have this problem. I got about 19.5k genes assigned padj in this dataset, whereas in a dataset that works, only about 12k genes were assigned a padj. Here is the summary(res) of a working dataset:

out of 22097 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 352, 1.6% 
LFC < 0 (down)   : 127, 0.57% 
outliers [1]     : 2102, 9.5% 
low counts [2]   : 7990, 36% 
(mean count < 4)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

I've even copied the counts from the dataset that doesn't work over the counts of a dataset that does work, keeping everything else constant, and I still get this problem. It appears that there is something about the counts data itself which prevents independent filtering. If I pre-filter with the code:

dds <- dds[ rowSums(counts(dds)) > 1, ] 

summary(res) gives:

out of 21094 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 0, 0% 
LFC < 0 (down)   : 0, 0% 
outliers [1]     : 2246, 11% 
low counts [2]   : 0, 0% 
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

I've read through the DESeq2 vignette and have searched for others with this problem, but no luck. Any ideas?

deseq2 • 771 views
ADD COMMENTlink modified 2.9 years ago by Michael Love22k • written 2.9 years ago by nschaum0
Answer: DESeq2 independent filtering fails to filter low counts
1
gravatar for Michael Love
2.9 years ago by
Michael Love22k
United States
Michael Love22k wrote:

hi,

A comment to this "genes with low mean counts are not assigned NA as they should be": genes with low mean counts are only filtered *IF* such filtering would increase the number of rejections. It could be that in your datasets (the ones in which you don't see automatic filtering kicking in) there are no significant genes regardless of mean filtering -- filtering or not filtering doesn't make a difference. So my guess would be it's a matter of the inherent amount of DE (and power to detect given you sample size) in the different datasets. You could make PCA plots to see whether there are clear differences in the separation across the different datasets and whether that accords with my guess.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Michael Love22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 325 users visited in the last hour