Question

using duplicateCorrelation with limma+voom for RNA-seq data

13

Entering edit mode

John Blischak ▴ 190

@john-blischak-6562

Last seen 6.7 years ago

Hi,

I would like to use the duplicateCorrelation function along with limma and voom in order to analyze some RNA-seq data with blocking and group-means parametrization (similar to example 9.7 in the limma manual). The manual does not contain a similar example using the voom function for RNA-seq data. I searched the Bioconductor mailing list and BioStars and found conflicting advice.

The first approach performs voom only once, prior to running duplicateCorrelation, e.g. this Bioc post

https://stat.ethz.ch/pipermail/bioconductor/2013-July/054087.html

and this Biostars post

https://www.biostars.org/p/54565

The second approach performs voom twice, both before and after running duplicateCorrelation, e.g. this Bioc post

https://stat.ethz.ch/pipermail/bioconductor/attachments/20130530/4dcc9475/attachment.pl

and this Biostars post

https://www.biostars.org/p/96889

Is there a consensus on which is the better method? Does the choice depend on the study design?

In both my actual data and in the simulated example below, I have found the results to be very similar. Thus I am currently leaning towards using the single voom method since it achieves similar results with fewer steps. The simulated example is not very interesting because blocking does not have much effect in the first place (corfit$consensus = -0.002984798), as expected from random data. However, in my data set it does make a difference (corfit$consensus = 0.3433403) and the results of using single or double voom are still similar.

Thanks,

John

library("limma")
library("edgeR")

# Simulate data
set.seed(123)
counts <- matrix(rpois(9000, lambda = 500), ncol = 90)
rownames(counts) <- paste0("g_", 1:nrow(counts))
anno <- data.frame(block = rep(paste0("block", 1:6), each = 15),
                   fac1 = rep(c("A", "B", "C"), 30),
                   fac2 = rep(paste0("treat", 1:5), 18))
combined_fac <- factor(paste(anno$fac1, anno$fac2, sep = "."))
design <- model.matrix(~0 + combined_fac)
colnames(design) <- levels(combined_fac)

# Single voom
y <- DGEList(counts)
y <- calcNormFactors(y)
v <- voom(y, design)
corfit <- duplicateCorrelation(v, design, block = anno$block)
fit <- lmFit(v, design, block = anno$block, correlation =
corfit$consensus)
cont_matrix <- makeContrasts(AvB_1 = A.treat1 - B.treat1,
                             AvB_2 = A.treat2 - B.treat2, levels =
design)
fit2 <- contrasts.fit(fit, cont_matrix)
fit2 <- eBayes(fit2)
result_single <- topTable(fit2, number = nrow(counts), sort.by =
"none")

# Double voom
y <- DGEList(counts)
y <- calcNormFactors(y)
v <- voom(y, design)
corfit <- duplicateCorrelation(v, design, block = anno$block)
v <- voom(y, design, block = anno$block, correlation =
corfit$consensus)
fit <- lmFit(v, design, block = anno$block, correlation =
corfit$consensus)
cont_matrix <- makeContrasts(AvB_1 = A.treat1 - B.treat1,
                             AvB_2 = A.treat2 - B.treat2, levels =
design)
fit2 <- contrasts.fit(fit, cont_matrix)
fit2 <- eBayes(fit2)
result_double <- topTable(fit2, number = nrow(counts), sort.by =
"none")

# No blocking
y <- DGEList(counts)
y <- calcNormFactors(y)
v <- voom(y, design)
fit <- lmFit(v, design)
cont_matrix <- makeContrasts(AvB_1 = A.treat1 - B.treat1,
                             AvB_2 = A.treat2 - B.treat2, levels =
design)
fit2 <- contrasts.fit(fit, cont_matrix)
fit2 <- eBayes(fit2)
result_no_block <- topTable(fit2, number = nrow(counts), sort.by =
"none")

# Comparison - all correlations > 0.99
cor(result_single$adj.P.Val, result_double$adj.P.Val)
cor(result_single$adj.P.Val, result_no_block$adj.P.Val)
cor(result_double$adj.P.Val, result_no_block$adj.P.Val)

limma voom • 15k views

ADD COMMENT • link updated 9.2 years ago by Gordon Smyth 51k • written 10.2 years ago by John Blischak ▴ 190

score 6 · Answer 1 · 2015-05-13

Charity and I were both a bit conservative with our advice in the past. We now recommend that you call voom and duplicateCorrelation twice each.

The same advice applies if you use sample quality weights, and this is now encapsulated into the voomWithQualityWeights function, see

http://www.ncbi.nlm.nih.gov/pubmed/25925576

You could certainly do more iterations, but the advantage seems to be small.

Gordon Smyth · Answer 2 · 2014-05-21

2

Entering edit mode

Bernd Klaus ▴ 610

@bernd-klaus-6281

Last seen 5.8 years ago

Germany

Hi John,

maybe I am wrong about this, but I would go for two step approach as Charity recommends.

The reasoning for this is the following: What the voom function does is essentially to provide weights for the regression fit. Now, in a situation where you want to estimate a correlation you first obtain "working weights"
that are estimated WITHOUT the correlation. You use them to estimate the correlation and then use the voom function again, this time taking the estimated correlation into account in the estimation of the weights.

Now it can well be that it does not really matter for the computed weights whether the correlation is taken into account or not (as your example seem to indicate) so a second run of voom might not be necessary but I would always do it since it can never 'hurt'.

The worst case is that you estimate essentially the same weights twice. (the voom function returns them, so you can check!

Best wishes,

Bernd

ADD COMMENT • link updated 9.2 years ago by Gordon Smyth 51k • written 10.2 years ago by Bernd Klaus ▴ 610

0

Entering edit mode

I wonder if would it be ideal, if you didn't mind the computational burden, to repeatedly iterate voom and duplicateCorrelation to convergence? And would it even have desirable convergence properties (guaranteed convergence, convexity, etc.)?

ADD REPLY • link 10.2 years ago • updated 8.1 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Thanks for the feedback Bernd and Ryan.

> Now it can well be that it does not really matter for the computed weights
> whether the correlation is
> taken into account or not (as your example seem to indicate) so a second
> run of voom might not be necessary but I would always do it since it can never 'hurt'.

That makes sense. Since it can only potentially improve the analysis and never hurt it, I will run voom a second time (and recommend that others do the same).

> The worst case is that you estimate essentially the same weights twice.
> (the voom function returns them, so you can check!)

Good idea. That is indeed what is happening. In my data, the weights have a median of ~4. When I compare the weights output by voom before and after incorporating the correlation, the median difference is only ~0.1.

> I wonder if would it be ideal, if you didn't mind the computational burden,
> to repeatedly iterate voom and duplicateCorrelation to convergence?
> And would it even have desirable convergence properties (guaranteed
> convergence, convexity, etc.)?

I can't answer your bigger questions, but I did try this on my data. After running the duplicateCorrelation-voom iteration 4 times, the median difference in the weights was already in the range of 10^-7. So for my data at least (corfit$consensus = 0.3433403) the successive iterations quickly stop making a difference.

John

ADD REPLY • link updated 9.2 years ago by Gordon Smyth 51k • written 10.2 years ago by John Blischak ▴ 190

0

Entering edit mode

I don't see an advantage in doing that. The weights and the the block correlation only need to be in right ball park. There is no advantage in computing them to 16 significant figures.

ADD REPLY • link 4.8 years ago Gordon Smyth 51k