deseq2 pre filter rna seq
1
0
Entering edit mode
badribio • 0
@badribio-12724
Last seen 5.4 years ago

I see more and more papers these days that use a read cut off before deseq2 analysis, I wonder what is the rational and/or how do they arrive at how many reads is a right number..for example from one of the papers DESeq2 was applied to genes having more than 20 reads in half the number of samples ^ this study had n=8 4wt and 4ko. i could not comprehend how did they arrive at the number 20, even if they did what if the differences are genotype driven more over DESeq2 has independent filtering which takes care of low counts no?

Is there a way one can plot counts and determine what is a right threshold for pre filtering

Thanks

deseq2 rnaseq • 1.9k views
ADD COMMENT
0
Entering edit mode

You'll likely get better answers if you employ some punctuation.

ADD REPLY
0
Entering edit mode

excuse me, hope i did ok this time.

ADD REPLY
0
Entering edit mode
zefrieira • 0
@zefrieira-20933
Last seen 3.9 years ago

DESeq2 filters genes based on their counts. Read the independent filtering section of the documentation.

ADD COMMENT
0
Entering edit mode

Correct. We do sometime perform minimal pre-filtering. Mostly this is to reduce unnecessary computation, whereas the independent filtering (or better yet, IHW) is for increasing power. There is no need to run DESeq2 for bulk RNA-seq on genes where only one or two samples have a single digit count and the rest have all 0's.

ADD REPLY
0
Entering edit mode

Hi Michael, Thank you for the reply. I ran an experiment to check what would happen if I do and do not perform pre-filtering. I realized that I am obtaining lists of significant genes of different sizes.

In the filtered case 1 (10 counts in at least 3 samples) I obtained 1933 genes and for the filtered case 2 (10 counts in 5 samples) I obtained 1462 genes (there are 8 samples in total 4 Experiment and 4 WT), while the unfiltered case gave me 1901 significant genes (pAdj < 0.05).

So it appears that this step is affecting my significant gene list. I am also working with pseudo-bulk aggregated single cell data. Could you please tell me what I can learn from this pattern of DEGs?

ADD REPLY
0
Entering edit mode

I read this that there are a ~400 genes which will be called differential due to shift in mean but are lowly expressed or only have counts in a few samples. Many users filter these out to focus on genes that are more stably expressed. This decision is up to you and your collaborators.

ADD REPLY

Login before adding your answer.

Traffic: 595 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6