Question: Remove genes with very low counts for all samples or let DESeq2 perform independent filtering?
0
gravatar for colaneri
3.0 years ago by
colaneri30
United States
colaneri30 wrote:

I’m following the DESeq2 tutorial to perform DGE analysis. I noticed that before run

 

Dds <- DESeq(dds)

 

it is recommended to remove genes whose counts are 0 for all the samples. I have question about this step:

 

  1. Why just genes with 0 counts in all samples? What about genes that add up a total of 5 counts considering all the samples? And 10? Which will be a reasonable threshold? I’m sure that for many experienced people doing DGE it should be a number that is sounded as a correct a safe threshold.  I will like to have some advice regarding this question.
  2. I understand that DESeq2 perform independent filtering, and that for this purpose it identify a threshold base in counts and remove genes that given the counts cannot produce a trustable result. My question is: why to bother to perform the above step if these genes are going to be filtered any way.
ADD COMMENTlink modified 3.0 years ago by Michael Love22k • written 3.0 years ago by colaneri30
Answer: Remove genes with very low counts for all samples or let DESeq2 perform independ
1
gravatar for Michael Love
3.0 years ago by
Michael Love22k
United States
Michael Love22k wrote:

from the vignette:

vignette("DESeq2")

"1.3.5 Pre-filtering
While it is not necessary to pre-filter low count genes before running the DESeq2 functions, there are two reasons which make pre-filtering useful: by removing rows in which there are no reads or nearly no reads, we reduce the memory size of the dds data object and we increase the speed of the transformation and testing functions within DESeq2. "

You don't have to filter at all though. The safest threshold would be to not filter anything above row sum of 0, and just let the data-driven software (which lives in the genefilter package, outside of DESeq2) choose the threshold that maximizes power. For more details you can read the citation for the genefilter package which is also referenced in the DESeq2 paper section on independent filtering.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Michael Love22k

Interesting.

On our dataset of 15 samples, we were removing the rows in which the average count was 1 or less.  We even thought about removing the rows in which the average count was 2 or less.

Are you saying it is better to not remove these lines at all?  Even the lines where the counts are zero for all samples?

ADD REPLYlink written 3.0 years ago by Marcelo Pereira70
1

I didn't say it was better. It should make no difference. You could increase the threshold even higher and it will begin to increase sensitivity make no difference up to a point at which you will be filtering too much (which will be different for each experiment).

The question was: what is a safe / reasonable threshold that will work for all experiments. And our recommendation (and the default in DESeq2) is to let the genefilter software optimize the threshold, such that sensitivity (statistical power) is maximized.

There is a separate reference for genefilter if you want to read about this. Also there is a new approach from Wolfgang's group: https://www.bioconductor.org/packages/IHW

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Michael Love22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 314 users visited in the last hour