Does manual pre-filtering of the data on read counts violate the assumptions of the dispersion estimation in DESeq2?
1
0
Entering edit mode
Johannes Rainer ★ 2.0k
@johannes-rainer-6987
Last seen 4 weeks ago
Italy

Dear all!

Sorry for yet another question on pre-filtering and DESeq2, but I didn't find anything related in the support pages... Now to my question:

I know that DESeq2 does a wonderful job of automatic pre-filtering, just, in my case it did remove a miRNA which, with 260 counts on average is not that low expressed, and in which differential expression I really believe. So, basically, I would like to do the pre-filtering myself, also accepting that I loose quite some power due to the stronger adjustment of multiple hypothesis testing.

My question now however is whether this pre-filtering, i.e. removing of low count features, interferes or violates the assumptions of the dispersion estimation (or any other assumption) in the DESeq2 model (Also considering that the pre-filtering in DESeq2 takes place after calculation or the raw p-values). My concern comes from the (ancient) field of microarrays were a variance based pre-filtering was thought to violate assumptions of the moderated t-test in limma.

Is the situation similar for DESeq2 and manual pre-filtering?

 

Thanks in advance!

cheers, jo

deseq2 pre-filtering • 1.9k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi Johannes,

Removing low count features in the beginning, before DESeq(), is ok for doing your own filtering. Filtering on any kind of location statistic of the normalized counts across all samples is valid for the method. I wouldn't recommend variance filtering, because we look at the total distribution of dispersion estimates during the estimation steps. 

The automatic independent filtering is generally useful, except in cases like yours when it's not, with specific genes under the threshold. The genefilter mechanism will always raise the threshold above a potentially significant gene with baseMean x, if this means adding more than one genes with baseMean y, where x < y. 

ADD COMMENT

Login before adding your answer.

Traffic: 634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6