Question

DESeq and Limma+Voom Normalization for Rna-Seq Data Using Ercc Spike-In

3

Entering edit mode

komal.rathi ▴ 120

@komalrathi-9163

Last seen 2.3 years ago

United States

Hello everyone,

I have some RNASeq data, that has been spiked-in with exogenous ERCC controls. I have the counts from htseq-count and now I want to normalize + test for differential expression. For this, I am using two methods: DESeq and limma+voom. Below is my code for both the tests. Can someone tell me if what I am doing is correct or not? I do think that I am slightly mistaken in the voom code.

##### Using DESeq
library(DESeq)
# get the size factors from the spiked-ins
# dat.spike.in is count data of only spike ins
cds <- newCountDataSet(countData = dat.spike.in, conditions = as.character(condition))
cds <- estimateSizeFactors(cds)
nf <- sizeFactors(cds) 

# add the size factors to data without spike-ins
# dat.without.spike.in is count data where spike-ins are removed
cds <- newCountDataSet(countData = dat.without.spike.in, conditions = as.character(condition))
sizeFactors(cds) <- nf 

# differential expression test
cds <- estimateDispersions( cds, method= 'pooled', sharingMode='maximum', fitType='local') 
deseq.res <- nbinomTest( cds, "A", "B")

##### Using Limma + Voom
library(limma)
library(edgeR)
groups = as.factor(condition) # conditions 
design <- model.matrix(~0+groups)
colnames(design) = levels(groups)
nf <- calcNormFactors(dat.spike.in) # calculation norm factors using spike ins only
voom.norm <- voom(dat.without.spike.in, design, lib.size = colSums(dat.spike.in) * nf) # add that to voom

Awaiting your input and suggestions!

Thanks!

P.S: Crossposted on Biostars to get inputs from both communities.

limma voom deseq normalization ercc • 8.6k views

ADD COMMENT • link updated 10.1 years ago by Gordon Smyth 53k • written 10.1 years ago by komal.rathi ▴ 120

3

Entering edit mode

Aaron Lun ★ 29k

@alun

Last seen 21 minutes ago

The city by the bay

Yep, looks good for voom. In fact, you could probably just use the count sums for the spike-ins directly as the library sizes, without multiplying by nf. I would be surprised if the normalization factors differ substantially from unity, as there shouldn't be any DE in the spike-in counts between samples. Of course, the implicit assumption of this normalization strategy is that you've added the same amount of spike-in (per cell, or per quantity of cellular RNA) to each sample. Whether that's true or not is probably dependent on how good your operator is.

ADD COMMENT • link 10.1 years ago Aaron Lun ★ 29k

Ryan C. Thompson · Accepted Answer · 2015-11-20

8

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 6 hours ago

WEHI, Melbourne, Australia

I suspect that you are doing what others have done, but I don't recommend this method of normalizing by spike-ins.

Somewhat better would be:

N <- colSums(dat.without.spike.in)
nf <- calcNormFactors(dat.spike.in, lib.size=N)
voom.norm <- voom(dat.without.spike.in, design, lib.size = N * nf)

but I don't really recommend that either.

I prefer to normalize on all genes in the usual way rather than trying to normalize on a small number of spike-ins. The problem is that spike-ins are inherently variable, as is remarked on in this paper:

http://www.nature.com/nbt/journal/v32/n9/full/nbt.2957.html

Have a look also at this paper:

http://www.nature.com/nbt/journal/v32/n9/full/nbt.2931.html

They show that "spike-ins are not reliable enough to be used in standard global-scaling normalization procedures" -- which is what you are doing when you call calcNormFactors() or estimateSizeFactors() in your code.

If you really insist on using spike-ins for normalization, then the best bet would be the RUVseq package described in the above paper. The RUVseq vignette shows I think how it can be used with edgeR or limma.

ADD COMMENT • link 10.1 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you for your detailed response. Not to sound naive but if spike-ins are not reliable for normalization, what is their purpose? What do people generally do with them?

ADD REPLY • link updated 10.1 years ago by Ryan C. Thompson ★ 7.9k • written 10.1 years ago by komal.rathi ▴ 120

2

Entering edit mode

Well, spike-ins can be used to assess accuracy and technical performance, see

http://www.ncbi.nlm.nih.gov/pubmed/25254650

and

http://rd.springer.com/chapter/10.1007%2F978-3-319-07212-8_9

A number of companies have been advocating spike-ins for normalization, for example:

https://cofactorgenomics.com/6-changes-thatll-make-big-difference-rna-seq-part-3/

and no doubt a lot of biologists have followed this advice. We went through the same process with microarrays 15 years ago, whereby spike-ins were proposed for normalization but the genomic methods proposed by statistical bioinformaticians gradually took over. There may be special situations, e.g., where all or most genes are DE, in which normalization by spike-ins is a good idea.