Question

normalization affects drastically limma-voom + sva results

0

Entering edit mode

aec ▴ 90

@aec-9409

Last seen 5.5 years ago

Dear all,

Depending on the normalization approach (none, quantile, TMM or DESeq2) applied to the limma-voom function, the number of surrogate variables found by SVA and number of differentially expressed genes changes a lot. My question is, does SVA replace the normalization step? For example, if the RNA-seq samples have different sequencing depths, is this a technical factor that SVA corrects for?

Thanks,

normalization sva limma-voom • 2.9k views

ADD COMMENT • link updated 8.4 years ago by Gordon Smyth 53k • written 8.4 years ago by aec ▴ 90

score 1 · Answer 1 · 2017-07-25

1

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 14 minutes ago

WEHI, Melbourne, Australia

No, sva does not take the place of normalization. It could ameliorate a problems slightly if you failed to normalize properly, but that is hardly the right way to go.

Note also that limma-voom always corrects for sequencing depth, even if you choose normalize="none".

It is not my experience that choosing between "TMM", "DESeq2" or quantile normalization drastically affects results from limma-voom.

ADD COMMENT • link 8.4 years ago Gordon Smyth 53k

0

Entering edit mode

Gordon,

My problem is that I have been trying SVA with different normalization options as follows:

y=DGEList(counts=counts)
A<-rowSums(y$counts)
isexpr<-A>500
y=y[isexpr,keep.lib.size=FALSE]
yy=calcNormFactors(y)
mod <- model.matrix(~group, data=info)
mod0 <- model.matrix(~1, data=info)
#option 1
v=voom(counts=y,design=mod)
n.sv = num.sv(v$E,mod,method="leek")
n.sv
1
#option 2
v=voom(counts=y,design=mod, normalize="quantile")
n.sv = num.sv(v$E,mod,method="leek")
n.sv
1
#option 3
v=voom(counts=yy,design=mod)
n.sv = num.sv(v$E,mod,method="leek")
n.sv
1
#option 4
n.sv = num.sv(yy,mod,method="leek")
n.sv
57
#option 5
n.sv = num.sv(y,mod,method="leek")
n.sv
57
#option 6
n.sv = num.sv(dat,mod,method="leek")
n.sv
57

For option 6 I followed :https://www.bioconductor.org/help/workflows/rnaseqGene/. Why is it that the number of sv depends so much on applying 'voom' or not?

ADD REPLY • link 8.4 years ago aec ▴ 90

0

Entering edit mode

It is completely incorrect to input counts to num.sv() as if they were microarray expression values. You have to use voom() or cpm() to convert the counts to a scale on which a microarray-type analysis makes sense.

When you do use voom(), you get the same number of surrogate variables regardless of what normalization method you used, just as I suggested to you would happen.

ADD REPLY • link 8.4 years ago Gordon Smyth 53k

0

Entering edit mode

Ok, then option 5 is discarded. But what about the rest?

option 4 used TMM normalization and option 6 uses DESeq2 normalization, they both give 57 sv.

options 1,2,3 use voom and give 1 sv.

option 6 follows this code https://www.bioconductor.org/help/workflows/rnaseqGene/ :

dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~ dex, colData(dds))
mod0 <- model.matrix(~   1, colData(dds))
svseq <- svaseq(dat, mod, mod0, n.sv = 2)

ADD REPLY • link 8.4 years ago aec ▴ 90

0

Entering edit mode

Option 4 is counts. Option 5 is counts. Option 6 is counts. They are all incorrect. Option 6 is not the same as the rnaseq workflow.

If you use voom, you get the correct result ( n.sv=1 ).

If you ignore voom and just use counts, you get the wrong result ( n.sv=57 ).

That's pretty clear, is it not?

ADD REPLY • link 8.4 years ago Gordon Smyth 53k

0

Entering edit mode

You are right Gordon, if I add this step:

logcpm <- cpm(yy, prior.count=2, log=TRUE)
n.sv = num.sv(logcpm,mod,method="leek")
n.sv
[1] 1

I got the same results as 'voom'.

If I want to calculate n.sv using the rnaseq DESeq2 worklow, how to do it? it is not specified. Should I use the 'rlog' or 'vst' transformations?

ADD REPLY • link 8.4 years ago aec ▴ 90