Filtering step in Differential analysis with RSEM values
1
0
Entering edit mode
Biologist ▴ 110
@biologist-9801
Last seen 4.1 years ago

Dear Aaron,

In this post (C: Possible ways of performing differential gene expression and analysis of RNA-Seq) Gordon gave a code for Differential analysis with RSEM values. In that he used a filtering step keeping genes that have about 10 counts or more in atleast 14 samples. Which means there are 14 samples in the smallest "experimental group".

In my case:

table(targets$Sample.type)

MB1   MB2 
286     80 

So, on what sample number should I filter now? 

Do I need to filter like this? 

keep <- rowSums(y > log2(11)) >= 80

rsem differential gene expression edger • 1.5k views
ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 46 minutes ago
WEHI, Melbourne, Australia

The reasons why we don't give prescriptive rules on how to filter are

  1. A range of sensible filtering cutoffs will give good results. You don't need to worry about what exact threshold you use, as long as it's in the sensible range.
  2. Good filtering depends on the nature of your data and what biological questions you're trying to answer.

In your case, you need to decide now many samples you would need a gene to be expressed in before it became biologically interesting.

Suppose a genes was expressed in 79 of the MB2 samples but none of the MB1. Would you want to call that gene as DE? Probably yes.

Suppose a genes was expressed in 60 of the MB2 samples but none of the MB1. Would you want to call that gene as DE? Again, probably yes.

Suppose a genes was expressed in 20 of the MB2 samples but none of the MB1. Would you want to call that gene as DE? Probably not. It's only expressed in a minority of samples for either group.

You need to decide the minimum number of samples that a gene would have to be expressed in for it to be biologically interesting to you. That shouldn't be higher than 80, but it might be as low as 50 or 60. You decide. This question gets back to why you're doing the DE analysis in the first place. If you want a suggestion, I'd probably go with around 60.

ADD COMMENT

Login before adding your answer.

Traffic: 783 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6