Question

deseq2 pre filter rna seq

0

Entering edit mode

badribio • 0

@badribio-12724

Last seen 6.5 years ago

I see more and more papers these days that use a read cut off before deseq2 analysis, I wonder what is the rational and/or how do they arrive at how many reads is a right number..for example from one of the papers DESeq2 was applied to genes having more than 20 reads in half the number of samples ^ this study had n=8 4wt and 4ko. i could not comprehend how did they arrive at the number 20, even if they did what if the differences are genotype driven more over DESeq2 has independent filtering which takes care of low counts no?

Is there a way one can plot counts and determine what is a right threshold for pre filtering

Thanks

deseq2 rnaseq • 2.5k views

ADD COMMENT • link updated 2.3 years ago by Michael Love 43k • written 6.5 years ago by badribio • 0

0

Entering edit mode

You'll likely get better answers if you employ some punctuation.

ADD REPLY • link 6.5 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

excuse me, hope i did ok this time.

ADD REPLY • link 6.5 years ago badribio • 0

score 0 · Answer 1 · 2019-07-09

0

Entering edit mode

zefrieira • 0

@zefrieira-20933

Last seen 5.0 years ago

DESeq2 filters genes based on their counts. Read the independent filtering section of the documentation.

ADD COMMENT • link 6.5 years ago zefrieira • 0

0

Entering edit mode

Correct. We do sometime perform minimal pre-filtering. Mostly this is to reduce unnecessary computation, whereas the independent filtering (or better yet, IHW) is for increasing power. There is no need to run DESeq2 for bulk RNA-seq on genes where only one or two samples have a single digit count and the rest have all 0's.

ADD REPLY • link 6.5 years ago Michael Love 43k

0

Entering edit mode

Hi Michael, Thank you for the reply. I ran an experiment to check what would happen if I do and do not perform pre-filtering. I realized that I am obtaining lists of significant genes of different sizes.

In the filtered case 1 (10 counts in at least 3 samples) I obtained 1933 genes and for the filtered case 2 (10 counts in 5 samples) I obtained 1462 genes (there are 8 samples in total 4 Experiment and 4 WT), while the unfiltered case gave me 1901 significant genes (pAdj < 0.05).

So it appears that this step is affecting my significant gene list. I am also working with pseudo-bulk aggregated single cell data. Could you please tell me what I can learn from this pattern of DEGs?

ADD REPLY • link 2.3 years ago Dennis • 0

0

Entering edit mode

I read this that there are a ~400 genes which will be called differential due to shift in mean but are lowly expressed or only have counts in a few samples. Many users filter these out to focus on genes that are more stably expressed. This decision is up to you and your collaborators.

ADD REPLY • link 2.3 years ago Michael Love 43k