DESeq and Limma+Voom Normalization for Rna-Seq Data Using Ercc Spike-In
Entering edit mode
komal.rathi ▴ 120
Last seen 9 months ago
United States

Hello everyone,

I have some RNASeq data, that has been spiked-in with exogenous ERCC controls. I have the counts from htseq-count and now I want to normalize + test for differential expression. For this, I am using two methods: DESeq and limma+voom. Below is my code for both the tests. Can someone tell me if what I am doing is correct or not? I do think that I am slightly mistaken in the voom code. 

##### Using DESeq
# get the size factors from the spiked-ins
# is count data of only spike ins
cds <- newCountDataSet(countData =, conditions = as.character(condition))
cds <- estimateSizeFactors(cds)
nf <- sizeFactors(cds) 

# add the size factors to data without spike-ins
# is count data where spike-ins are removed
cds <- newCountDataSet(countData =, conditions = as.character(condition))
sizeFactors(cds) <- nf 

# differential expression test
cds <- estimateDispersions( cds, method= 'pooled', sharingMode='maximum', fitType='local') 
deseq.res <- nbinomTest( cds, "A", "B")

##### Using Limma + Voom
groups = as.factor(condition) # conditions 
design <- model.matrix(~0+groups)
colnames(design) = levels(groups)
nf <- calcNormFactors( # calculation norm factors using spike ins only
voom.norm <- voom(, design, lib.size = colSums( * nf) # add that to voom

Awaiting your input and suggestions!


P.S: Crossposted on Biostars to get inputs from both communities.


limma voom deseq normalization ercc • 7.5k views
Entering edit mode
Last seen 1 hour ago
WEHI, Melbourne, Australia

I suspect that you are doing what others have done, but I don't recommend this method of normalizing by spike-ins.

Somewhat better would be:

N <- colSums(
nf <- calcNormFactors(, lib.size=N)
voom.norm <- voom(, design, lib.size = N * nf)

but I don't really recommend that either.

I prefer to normalize on all genes in the usual way rather than trying to normalize on a small number of spike-ins. The problem is that spike-ins are inherently variable, as is remarked on in this paper:

Have a look also at this paper:

They show that "spike-ins are not reliable enough to be used in standard global-scaling normalization procedures" -- which is what you are doing when you call calcNormFactors() or estimateSizeFactors() in your code.

If you really insist on using spike-ins for normalization, then the best bet would be the RUVseq package described in the above paper. The RUVseq vignette shows I think how it can be used with edgeR or limma.

Entering edit mode

Thank you for your detailed response. Not to sound naive but if spike-ins are not reliable for normalization, what is their purpose? What do people generally do with them?

Entering edit mode

Well, spike-ins can be used to assess accuracy and technical performance, see


A number of companies have been advocating spike-ins for normalization, for example:

and no doubt a lot of biologists have followed this advice. We went through the same process with microarrays 15 years ago, whereby spike-ins were proposed for normalization but the genomic methods proposed by statistical bioinformaticians gradually took over. There may be special situations, e.g., where all or most genes are DE, in which normalization by spike-ins is a good idea.

Entering edit mode

Thank you, this really helps.

Entering edit mode
Aaron Lun ★ 28k
Last seen 2 hours ago
The city by the bay

Yep, looks good for voom. In fact, you could probably just use the count sums for the spike-ins directly as the library sizes, without multiplying by nf. I would be surprised if the normalization factors differ substantially from unity, as there shouldn't be any DE in the spike-in counts between samples. Of course, the implicit assumption of this normalization strategy is that you've added the same amount of spike-in (per cell, or per quantity of cellular RNA) to each sample. Whether that's true or not is probably dependent on how good your operator is.


Login before adding your answer.

Traffic: 502 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6