Entering edit mode
Dear Gowthaman,
Personally, I would remove the rRNA genes, recompute the library sizes
based on the remaining counts, then scale normalize with
calcNormFactors().
However, edgeR will be fairly robust against whether or not the rRNA
genes
are kept in or how much of the library they consume provided that
calcNormFactors() is used.
Best wishes
Gordon
> Date: Fri, 12 Apr 2013 08:15:35 -0700
> From: gowtham <ragowthaman at="" gmail.com="">
> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
> Subject: [BioC] edgeR : Include or exclude structural/noncoding RNA
> reads in the analysis
>
> Hi Everyone,
> I have been using edgeR for quite sometime. Most of our RNAseq data
comes
> from infectious organisms like Malaria and Tryps. Our libraries
generally
> have 10 to 20% of the reads coming from rRNA genes (not sure if this
is the
> typical value for other organisms/protocols). All these days, I have
been
> ignoring them while doing the DE analysis using edgeR.
>
> I am NOT interested in differential expression of rRNA genes, but,
worrying
> that excluding them from edgeR might bias the library size
calculations. On
> the other hand, including them might introduce bunch of outliers
(these
> rRNA genes have very high read counts). I could not intuitively
decide one
> over other. So, asking for a help from experts.
>
> Does this change if libraries have varying amount of rRNA
contamination.
> Say, one set of libraries have 20% rRNA and another has 40%.
>
> Thanks a bunch in advance,
> Gowthaman
>
>
> --
> Gowthaman
>
> Bioinformatics Systems Programmer.
> SBRI, 307 West lake Ave N Suite 500
> Seattle, WA. 98109-5219
> Phone : LAB 206-256-7188 (direct).
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}