Entering edit mode
Hi Mahnaz,
Why don't you follow the advice of the edgeR User's Guide (as Mark has
suggested)? All the case studies in the User's Guide describe how the
filtering was done in a principled way.
Total count filtering is not so bad, but it is susceptible to being
driven
by one library, especially by one library with a large sequence depth.
The procedure described by Mark and used in the guide is a compromise
of
several considerations.
BTW, there are newer versions of R and edgeR available than what you
are
using.
Best wishes
Gordon
> Date: Wed, 30 Apr 2014 21:34:50 +0200
> From: Mark Robinson <mark.robinson at="" imls.uzh.ch="">
> To: "Ryan C. Thompson" <rct at="" thompsonclan.org="">
> Cc: bioconductor at r-project.org, Mahnaz Kiani <mahnazkiani at="" gmail.com="">
> Subject: Re: [BioC] total count filter cutoff
>
>
> In my lab, we typically follow a "CPM of at least X in at least Y
> samples" rule, where X=1 (arbitrary but reasonable, can be changed)
and
> Y=size of smallest replicate group, according to one of the case
studies
> in the user's guide, for example:
>
> ------
> 4.3.6 Filtering
> We filter out very lowly expressed tags, keeping genes that are
> expressed at a reasonable level in at least one treatment condition.
> Since the smallest group size is three, we keep genes that achieve
at
> least one count per million (cpm) in at least three samples:
>
>> keep <- rowSums(cpm(y)>1) >= 3
>> y <- y[keep,]
> ------
>
> (http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/i
nst/doc/edgeRUsersGuide.pdf)
>
> Cheers, Mark
>
>
> ----------
> Prof. Dr. Mark Robinson
> Statistical Bioinformatics, Institute of Molecular Life Sciences
> University of Zurich
> http://ow.ly/riRea
> Date: Wed, 30 Apr 2014 11:29:28 -0700 (PDT)
> From: "mahnaz Kiani [guest]" <guest at="" bioconductor.org="">
> To: bioconductor at r-project.org, mahnazkiani at gmail.com
> Subject: [BioC] total count filter cutoff
>
>
> I'm using edgeR for analysis of may data and I'm not sure what total
> count filter value cutoff value I should use, My reads are paired
50bP
> reads and total reads per sample is about 80,000,000. I tried cutoff
> values of 5,10,15,30,50 and 100 and I only saw differences between
50
> and 100 but still looking for logical reason to chose the cutoff
value.
>
> Appreciate your help,
> Mahnaz
>
> -- output of sessionInfo():
>
> R 3.0.2
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}