Question: edgeR and Normalization of RNAseq data using ERCC controls
gravatar for John
3.9 years ago by
United Kingdom
John0 wrote:

Dear List,

I have read a useful segment from a BioStar post on using DESeq with ERCC controls to normalize RNAseq counts. 

Contained on the page (, is the statement;

"Read in the count data, subset the resulting matrix such that it includes only the spike-ins, create a DESeqDataSet from that and then just estimateSizeFactors() on the results. The size factors can then be placed in the appropriate slot on the DESeqDataSet for the full count matrix."

However, with edgeR, the process is possibly not as straightforward; DESeq has a sizeFactor slot in the CountDataSet object, whilst edgeR has lib.size and norm.factors slots in a DGEList object. lib size and size factor are different things. I can adjust the lib.size values based on weights calculated from estimateSizeFactors(). But is that valid to do (I make the assumption that norm.factors is produced by the TMM normalization step)? 

I understand EdgeR does a TMM normalization step, so if the library sizes are changed manually, will the TMM normalization still be right?

So code I was thinking of could look something like;


cds = newCountDataSet( Just_ERCC_Bclass, group )
cds = estimateSizeFactors( cds )

my <- DGEList(counts=Not_ERCC, group=group)
my$samples$lib.size<-my$samples$lib.size/sizeFactors( cds )
my <- calcNormFactors(my)
.... and so on as in the manual. 

What would be the right way to do this? 

Thank you. 










ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by John0

Before you go down this path, you might consider what the SEQC/MAQC-III consortium has to say about using the ERCC controls for, like, anything.

The short story is that they determined that the apparent amounts of the ERCC spike in samples varied widely, most likely due (IMO) to the fact that you have to aliquot microliter amounts of the spike-in solution, and most people use vacuum aspirating pipettes for this step, which is almost impossible to do accurately.

In other words, the manual for the ERCC spike in samples says you should aliquot 1 µl of a 1:10 dilution. So most people will do something laughably inaccurate like putting 1 µl of the concentrated solution into 9 µl RNAse free water, vortex, and then aliquot 1 µl out of that, using their trusty Rainin pipette, and you can see how that turns out by taking a look at the SEQC/MAQC-III paper I reference above.

ADD REPLYlink written 3.9 years ago by James W. MacDonald48k
gravatar for Ryan C. Thompson
3.9 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.9k wrote:

edgeR will not compute normalization factors unless you use the "calcNormFactors" function. If you have computed your own normalized library sizes, you can pass them as the "lib.size" argument to the "DGEList" constructor and then skip the "calcNormFactors" step and proceed as usual.


But keep in mind James MacDonald's comment above, and consider using TMM or some other method instead. At the very least, compare your spike-in-based normalization to a TMM-based one using MDS plots and the like.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Ryan C. Thompson6.9k

How to compute ERCC normalized library sizes for your own samples? Can edgeR do this? Thank you.

ADD REPLYlink written 17 months ago by mousheng xu10

You are asking how to do something as a comment to a post that tells you exactly how you would do it! You compute your own normalized library sizes and pass them as the lib.size argument.

ADD REPLYlink written 17 months ago by James W. MacDonald48k
gravatar for John
3.9 years ago by
United Kingdom
John0 wrote:

Dear James and Ryan,

That information is interesting and very helpful, thank you. 


ADD COMMENTlink written 3.9 years ago by John0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 316 users visited in the last hour