Normalization of RNAseq data using ERCC?
1
0
Entering edit mode
Agnes Paquet ▴ 30
@agnes-paquet-6315
Last seen 10.2 years ago
Dear List, We have just started using ERCC spike-in controls in our RNAseq experiments. I have looked for recommended approaches on how to use the controls for normalization, but I couldn't find much information. From what I read, I am planning to use the spike-ins to estimate the sizeFactors in our differential analysis pipeline. Is there a better approach that we could use to normalize our data based on the spike- ins? Can anyone recommend any paper covering that topic? Thank you for your help, Agnes
Normalization Normalization • 4.0k views
ADD COMMENT
0
Entering edit mode
Davis, Wade ▴ 350
@davis-wade-2803
Last seen 10.2 years ago
Hi Agnes, I have used ERCC spike-ins in a large RNA-Seq study (600+ samples). I would temper expectations for any approach based on them. The dynamic range of the spike-in is large (I recall 18 orders of magnitude on base 2 scale), so unless you are sequencing quite deeply, don't get high read counts for at least the bottom 1/3 of that range. I tried a number of different strategies to use that information for the sizefactors, but was never comfortable with the results from that approach. The spike-ins themselves are subject to a great deal of sample-to-sample variability (due to pipetting variance, difference in library diversity, etc.) which makes using it as a basis for normalizing less appealing when you see the results. The result was sample differences of several fold in cases. By the way, our depth was ~ 20M reads per sample. My experience agrees with that reported in the following paper, which uses some data from the SEQC study, and does consider the spike-ins in a complex background (i.e., spiked-in to a human sample at suggested concentrations). They also looked at large data sets. http://life.scichina.com:8082/sciCe/EN/abstract/abstract510013.shtml This paper (http://www.ncbi.nlm.nih.gov/pubmed/21816910) is more optimistic, and may seem somewhat contradictory to my comments and the paper above; however, a key difference is sampling depth in the latter. A glance at supplemental table S2 shows the average number of reads was 230M PER (human) SAMPLE! They also used paired-end reads. I did find the spike-ins useful for computing an "empirical" false discovery rate (using the ERCC Set B) between groups. With reasonable sample sizes per group (n=8), the group mean fold changes we extremely close to 1 for those probes, even though they were not used in the normalization procedure per se. I'd be happy to discuss more off the list, and point you to publications where I used them as a measure of false discovery. Regards, Wade -----Original Message----- From: Agnes Paquet [mailto:paquet@ipmc.cnrs.fr] Sent: Thursday, February 06, 2014 9:05 AM To: bioconductor at r-project.org Subject: [BioC] Normalization of RNAseq data using ERCC? Dear List, We have just started using ERCC spike-in controls in our RNAseq experiments. I have looked for recommended approaches on how to use the controls for normalization, but I couldn't find much information. From what I read, I am planning to use the spike-ins to estimate the sizeFactors in our differential analysis pipeline. Is there a better approach that we could use to normalize our data based on the spike- ins? Can anyone recommend any paper covering that topic? Thank you for your help, Agnes
ADD COMMENT

Login before adding your answer.

Traffic: 674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6