Pre-filtering results influence on downstream analysis
1
0
Entering edit mode
@andrebolerbarros-16788
Last seen 6 days ago
Portugal

Hello everyone,

I am currently working on RNA-Seq data using DESeq2. As it is in the manual, you can perform pre-filtering (e.g.:

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

However, it's also said that: "While it is not necessary to pre-filter low count genes before running the DESeq2 functions...". So, from what I gather, using this threshold (10) or just removing genes w/ zero counts would yield the same result.

In my results, I used both criteria and, although the summary output is the same, I get different p-values (non-corrected and after BH adjustment).

dm <- DESeqDataSetFromMatrix(countData = tab, colData = design, design = ~ group)
dm<-dm[rowSums(counts(dm)) > 0 , ]
dm<-DESeq(dm)

ashr_zero<-lfcShrink(dm,contrast=c("group","trt","untrt"),type="ashr")

dm <- DESeqDataSetFromMatrix(countData = tab, colData = design, design = ~ group)
dm<-dm[rowSums(counts(dm)) > 10 , ]
dm<-DESeq(dm)

ashr_ten<-lfcShrink(dm,contrast=c("group","trt","untrt"),type="ashr")

ashr_zero<-ashr_zero[rownames(ashr_zero) %in% rownames(ashr_ten),]


all(rownames(ashr_zero)==rownames(ashr_ten)) #to check if I'm comparing the same genes
[1] TRUE
check1<-vector()

for (i in 1:ncol(ashr_res1)) {
  check1[i]<-all(ashr_zero[,i] == ashr_ten[,i],na.rm=T)
}
check1
[1]  TRUE FALSE FALSE FALSE FALSE

By looking at the summary, the independent filtering criteria is the same, the number of genes is different (which is normal, considering I filter more genes in the threshold 10 than for zero) but, I really don't understand what is causing this difference.

Thanks!

deseq2 rnaseq • 754 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

It should not yield an identical result. The low count genes will have some influence on the parameters of the dispersion function.

ADD COMMENT
0
Entering edit mode
It's what I suspected, thanks! Then, what criteria for pre-filtering should I use?
ADD REPLY
0
Entering edit mode

It doesn’t really matter, except that once you pick one filtering rule you should note it down and stick with it for computational reproducibility.

ADD REPLY

Login before adding your answer.

Traffic: 597 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6