Question

Normalization of RNAseq data using ERCC?

0

Entering edit mode

Agnes Paquet ▴ 30

@agnes-paquet-6315

Last seen 11.4 years ago

Dear List, We have just started using ERCC spike-in controls in our RNAseq experiments. I have looked for recommended approaches on how to use the controls for normalization, but I couldn't find much information. From what I read, I am planning to use the spike-ins to estimate the sizeFactors in our differential analysis pipeline. Is there a better approach that we could use to normalize our data based on the spike- ins? Can anyone recommend any paper covering that topic? Thank you for your help, Agnes

Normalization Normalization • 4.4k views

ADD COMMENT • link updated 12.0 years ago by Davis, Wade ▴ 350 • written 12.0 years ago by Agnes Paquet ▴ 30

score 0 · Answer 1 · 2014-02-07

Hi Agnes, I have used ERCC spike-ins in a large RNA-Seq study (600+ samples). I would temper expectations for any approach based on them. The dynamic range of the spike-in is large (I recall 18 orders of magnitude on base 2 scale), so unless you are sequencing quite deeply, don't get high read counts for at least the bottom 1/3 of that range. I tried a number of different strategies to use that information for the sizefactors, but was never comfortable with the results from that approach. The spike-ins themselves are subject to a great deal of sample-to-sample variability (due to pipetting variance, difference in library diversity, etc.) which makes using it as a basis for normalizing less appealing when you see the results. The result was sample differences of several fold in cases. By the way, our depth was ~ 20M reads per sample. My experience agrees with that reported in the following paper, which uses some data from the SEQC study, and does consider the spike-ins in a complex background (i.e., spiked-in to a human sample at suggested concentrations). They also looked at large data sets. http://life.scichina.com:8082/sciCe/EN/abstract/abstract510013.shtml This paper (http://www.ncbi.nlm.nih.gov/pubmed/21816910) is more optimistic, and may seem somewhat contradictory to my comments and the paper above; however, a key difference is sampling depth in the latter. A glance at supplemental table S2 shows the average number of reads was 230M PER (human) SAMPLE! They also used paired-end reads. I did find the spike-ins useful for computing an "empirical" false discovery rate (using the ERCC Set B) between groups. With reasonable sample sizes per group (n=8), the group mean fold changes we extremely close to 1 for those probes, even though they were not used in the normalization procedure per se. I'd be happy to discuss more off the list, and point you to publications where I used them as a measure of false discovery. Regards, Wade -----Original Message----- From: Agnes Paquet [mailto:paquet@ipmc.cnrs.fr] Sent: Thursday, February 06, 2014 9:05 AM To: bioconductor at r-project.org Subject: [BioC] Normalization of RNAseq data using ERCC? Dear List, We have just started using ERCC spike-in controls in our RNAseq experiments. I have looked for recommended approaches on how to use the controls for normalization, but I couldn't find much information. From what I read, I am planning to use the spike-ins to estimate the sizeFactors in our differential analysis pipeline. Is there a better approach that we could use to normalize our data based on the spike- ins? Can anyone recommend any paper covering that topic? Thank you for your help, Agnes