Question: DESeq and Limma+Voom Normalization for Rna-Seq Data Using Ercc Spike-In
1
gravatar for komal.rathi
4.1 years ago by
komal.rathi80
United States
komal.rathi80 wrote:

Hello everyone,

I have some RNASeq data, that has been spiked-in with exogenous ERCC controls. I have the counts from htseq-count and now I want to normalize + test for differential expression. For this, I am using two methods: DESeq and limma+voom. Below is my code for both the tests. Can someone tell me if what I am doing is correct or not? I do think that I am slightly mistaken in the voom code. 

##### Using DESeq
library(DESeq)
# get the size factors from the spiked-ins
# dat.spike.in is count data of only spike ins
cds <- newCountDataSet(countData = dat.spike.in, conditions = as.character(condition))
cds <- estimateSizeFactors(cds)
nf <- sizeFactors(cds) 

# add the size factors to data without spike-ins
# dat.without.spike.in is count data where spike-ins are removed
cds <- newCountDataSet(countData = dat.without.spike.in, conditions = as.character(condition))
sizeFactors(cds) <- nf 

# differential expression test
cds <- estimateDispersions( cds, method= 'pooled', sharingMode='maximum', fitType='local') 
deseq.res <- nbinomTest( cds, "A", "B")

##### Using Limma + Voom
library(limma)
library(edgeR)
groups = as.factor(condition) # conditions 
design <- model.matrix(~0+groups)
colnames(design) = levels(groups)
nf <- calcNormFactors(dat.spike.in) # calculation norm factors using spike ins only
voom.norm <- voom(dat.without.spike.in, design, lib.size = colSums(dat.spike.in) * nf) # add that to voom

Awaiting your input and suggestions!

Thanks!

P.S: Crossposted on Biostars to get inputs from both communities.

 

normalization limma deseq voom ercc • 4.0k views
ADD COMMENTlink modified 4.1 years ago by Gordon Smyth39k • written 4.1 years ago by komal.rathi80
Answer: DESeq and Limma+Voom Normalization for Rna-Seq Data Using Ercc Spike-In
7
gravatar for Gordon Smyth
4.1 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

I suspect that you are doing what others have done, but I don't recommend this method of normalizing by spike-ins.

Somewhat better would be:

N <- colSums(dat.without.spike.in)
nf <- calcNormFactors(dat.spike.in, lib.size=N)
voom.norm <- voom(dat.without.spike.in, design, lib.size = N * nf)

but I don't really recommend that either.

I prefer to normalize on all genes in the usual way rather than trying to normalize on a small number of spike-ins. The problem is that spike-ins are inherently variable, as is remarked on in this paper:

  http://www.nature.com/nbt/journal/v32/n9/full/nbt.2957.html

Have a look also at this paper:

  http://www.nature.com/nbt/journal/v32/n9/full/nbt.2931.html

They show that "spike-ins are not reliable enough to be used in standard global-scaling normalization procedures" -- which is what you are doing when you call calcNormFactors() or estimateSizeFactors() in your code.

If you really insist on using spike-ins for normalization, then the best bet would be the RUVseq package described in the above paper. The RUVseq vignette shows I think how it can be used with edgeR or limma.

ADD COMMENTlink modified 4.0 years ago • written 4.1 years ago by Gordon Smyth39k

Thank you for your detailed response. Not to sound naive but if spike-ins are not reliable for normalization, what is their purpose? What do people generally do with them?

ADD REPLYlink modified 4.0 years ago by Ryan C. Thompson7.4k • written 4.0 years ago by komal.rathi80
2

Well, spike-ins can be used to assess accuracy and technical performance, see

  http://www.ncbi.nlm.nih.gov/pubmed/25254650

and

  http://rd.springer.com/chapter/10.1007%2F978-3-319-07212-8_9

A number of companies have been advocating spike-ins for normalization, for example:

  https://cofactorgenomics.com/6-changes-thatll-make-big-difference-rna-seq-part-3/

and no doubt a lot of biologists have followed this advice. We went through the same process with microarrays 15 years ago, whereby spike-ins were proposed for normalization but the genomic methods proposed by statistical bioinformaticians gradually took over. There may be special situations, e.g., where all or most genes are DE, in which normalization by spike-ins is a good idea.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Gordon Smyth39k

Thank you, this really helps.

ADD REPLYlink written 4.0 years ago by komal.rathi80
Answer: DESeq and Limma+Voom Normalization for Rna-Seq Data Using Ercc Spike-In
2
gravatar for Aaron Lun
4.1 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Yep, looks good for voom. In fact, you could probably just use the count sums for the spike-ins directly as the library sizes, without multiplying by nf. I would be surprised if the normalization factors differ substantially from unity, as there shouldn't be any DE in the spike-in counts between samples. Of course, the implicit assumption of this normalization strategy is that you've added the same amount of spike-in (per cell, or per quantity of cellular RNA) to each sample. Whether that's true or not is probably dependent on how good your operator is.

ADD COMMENTlink written 4.1 years ago by Aaron Lun25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 296 users visited in the last hour