DESeq and Limma+Voom Normalization for Rna-Seq Data Using Ercc Spike-In
2
3
Entering edit mode
komal.rathi ▴ 120
@komalrathi-9163
Last seen 7 months ago
United States

Hello everyone,

I have some RNASeq data, that has been spiked-in with exogenous ERCC controls. I have the counts from htseq-count and now I want to normalize + test for differential expression. For this, I am using two methods: DESeq and limma+voom. Below is my code for both the tests. Can someone tell me if what I am doing is correct or not? I do think that I am slightly mistaken in the voom code. 

##### Using DESeq
library(DESeq)
# get the size factors from the spiked-ins
# dat.spike.in is count data of only spike ins
cds <- newCountDataSet(countData = dat.spike.in, conditions = as.character(condition))
cds <- estimateSizeFactors(cds)
nf <- sizeFactors(cds) 

# add the size factors to data without spike-ins
# dat.without.spike.in is count data where spike-ins are removed
cds <- newCountDataSet(countData = dat.without.spike.in, conditions = as.character(condition))
sizeFactors(cds) <- nf 

# differential expression test
cds <- estimateDispersions( cds, method= 'pooled', sharingMode='maximum', fitType='local') 
deseq.res <- nbinomTest( cds, "A", "B")

##### Using Limma + Voom
library(limma)
library(edgeR)
groups = as.factor(condition) # conditions 
design <- model.matrix(~0+groups)
colnames(design) = levels(groups)
nf <- calcNormFactors(dat.spike.in) # calculation norm factors using spike ins only
voom.norm <- voom(dat.without.spike.in, design, lib.size = colSums(dat.spike.in) * nf) # add that to voom

Awaiting your input and suggestions!

Thanks!

P.S: Crossposted on Biostars to get inputs from both communities.

 

limma voom deseq normalization ercc • 7.4k views
ADD COMMENT
8
Entering edit mode
@gordon-smyth
Last seen 19 minutes ago
WEHI, Melbourne, Australia

I suspect that you are doing what others have done, but I don't recommend this method of normalizing by spike-ins.

Somewhat better would be:

N <- colSums(dat.without.spike.in)
nf <- calcNormFactors(dat.spike.in, lib.size=N)
voom.norm <- voom(dat.without.spike.in, design, lib.size = N * nf)

but I don't really recommend that either.

I prefer to normalize on all genes in the usual way rather than trying to normalize on a small number of spike-ins. The problem is that spike-ins are inherently variable, as is remarked on in this paper:

  http://www.nature.com/nbt/journal/v32/n9/full/nbt.2957.html

Have a look also at this paper:

  http://www.nature.com/nbt/journal/v32/n9/full/nbt.2931.html

They show that "spike-ins are not reliable enough to be used in standard global-scaling normalization procedures" -- which is what you are doing when you call calcNormFactors() or estimateSizeFactors() in your code.

If you really insist on using spike-ins for normalization, then the best bet would be the RUVseq package described in the above paper. The RUVseq vignette shows I think how it can be used with edgeR or limma.

ADD COMMENT
0
Entering edit mode

Thank you for your detailed response. Not to sound naive but if spike-ins are not reliable for normalization, what is their purpose? What do people generally do with them?

ADD REPLY
2
Entering edit mode

Well, spike-ins can be used to assess accuracy and technical performance, see

  http://www.ncbi.nlm.nih.gov/pubmed/25254650

and

  http://rd.springer.com/chapter/10.1007%2F978-3-319-07212-8_9

A number of companies have been advocating spike-ins for normalization, for example:

  https://cofactorgenomics.com/6-changes-thatll-make-big-difference-rna-seq-part-3/

and no doubt a lot of biologists have followed this advice. We went through the same process with microarrays 15 years ago, whereby spike-ins were proposed for normalization but the genomic methods proposed by statistical bioinformaticians gradually took over. There may be special situations, e.g., where all or most genes are DE, in which normalization by spike-ins is a good idea.

ADD REPLY
0
Entering edit mode

Thank you, this really helps.

ADD REPLY
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 3 hours ago
The city by the bay

Yep, looks good for voom. In fact, you could probably just use the count sums for the spike-ins directly as the library sizes, without multiplying by nf. I would be surprised if the normalization factors differ substantially from unity, as there shouldn't be any DE in the spike-in counts between samples. Of course, the implicit assumption of this normalization strategy is that you've added the same amount of spike-in (per cell, or per quantity of cellular RNA) to each sample. Whether that's true or not is probably dependent on how good your operator is.

ADD COMMENT

Login before adding your answer.

Traffic: 1024 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6