Search
Question: using duplicateCorrelation with limma+voom for RNA-seq data
6
gravatar for John Blischak
3.5 years ago by
John Blischak120
John Blischak120 wrote:

Hi,

I would like to use the duplicateCorrelation function along with limma and voom in order to analyze some RNA-seq data with blocking and group-means parametrization (similar to example 9.7 in the limma manual). The manual does not contain a similar example using the voom function for RNA-seq data. I searched the Bioconductor mailing list

(query<https: encrypted.google.com="" search?hl="en&amp;q=voom%20site%3Ahttps" %3a%2f%2fstat.ethz.ch%2fpipermail%2fbioconductor%2f#hl="en&amp;q=voom+dupli" catecorrelation+site:https:%2f%2fstat.ethz.ch%2fpipermail%2fbioconduct="" or%2f="">)

and BioStars

(query<https: encrypted.google.com="" search?hl="en&amp;q=voom%2" 0duplicatecorrelation%20site%3abiostars.org="">)

and found conflicting advice.

The first approach performs voom only once, prior to running duplicateCorrelation, e.g. this Bioc post

https://stat.ethz.ch/pipermail/bioconductor/2013-July/054087.html

and this Biostars post

https://www.biostars.org/p/54565

The second approach performs voom twice, both before and after running duplicateCorrelation, e.g. this Bioc post

https://stat.ethz.ch/pipermail/bioconductor/attachments/20130530/4dcc9475/attachment.pl

and this Biostars post

https://www.biostars.org/p/96889

Is there a consensus on which is the better method? Does the choice depend on the study design?

In both my actual data and in the simulated example below, I have found the results to be very similar. Thus I am currently leaning towards using the single voom method since it achieves similar results with fewer steps. The simulated example is not very interesting because blocking does not have much effect in the first place (corfit$consensus = -0.002984798), as expected from random data. However, in my data set it does make a difference (corfit$consensus = 0.3433403) and the results of using single or double voom are still similar.

Thanks,

John

library("limma")
library("edgeR")

# Simulate data
set.seed(123)
counts <- matrix(rpois(9000, lambda = 500), ncol = 90)
rownames(counts) <- paste0("g_", 1:nrow(counts))
anno <- data.frame(block = rep(paste0("block", 1:6), each = 15),
                   fac1 = rep(c("A", "B", "C"), 30),
                   fac2 = rep(paste0("treat", 1:5), 18))
combined_fac <- factor(paste(anno$fac1, anno$fac2, sep = "."))
design <- model.matrix(~0 + combined_fac)
colnames(design) <- levels(combined_fac)

# Single voom
y <- DGEList(counts)
y <- calcNormFactors(y)
v <- voom(y, design)
corfit <- duplicateCorrelation(v, design, block = anno$block)
fit <- lmFit(v, design, block = anno$block, correlation =
corfit$consensus)
cont_matrix <- makeContrasts(AvB_1 = A.treat1 - B.treat1,
                             AvB_2 = A.treat2 - B.treat2, levels =
design)
fit2 <- contrasts.fit(fit, cont_matrix)
fit2 <- eBayes(fit2)
result_single <- topTable(fit2, number = nrow(counts), sort.by =
"none")

# Double voom
y <- DGEList(counts)
y <- calcNormFactors(y)
v <- voom(y, design)
corfit <- duplicateCorrelation(v, design, block = anno$block)
v <- voom(y, design, block = anno$block, correlation =
corfit$consensus)
fit <- lmFit(v, design, block = anno$block, correlation =
corfit$consensus)
cont_matrix <- makeContrasts(AvB_1 = A.treat1 - B.treat1,
                             AvB_2 = A.treat2 - B.treat2, levels =
design)
fit2 <- contrasts.fit(fit, cont_matrix)
fit2 <- eBayes(fit2)
result_double <- topTable(fit2, number = nrow(counts), sort.by =
"none")

# No blocking
y <- DGEList(counts)
y <- calcNormFactors(y)
v <- voom(y, design)
fit <- lmFit(v, design)
cont_matrix <- makeContrasts(AvB_1 = A.treat1 - B.treat1,
                             AvB_2 = A.treat2 - B.treat2, levels =
design)
fit2 <- contrasts.fit(fit, cont_matrix)
fit2 <- eBayes(fit2)
result_no_block <- topTable(fit2, number = nrow(counts), sort.by =
"none")

# Comparison - all correlations > 0.99
cor(result_single$adj.P.Val, result_double$adj.P.Val)
cor(result_single$adj.P.Val, result_no_block$adj.P.Val)
cor(result_double$adj.P.Val, result_no_block$adj.P.Val)
ADD COMMENTlink modified 2.5 years ago by Gordon Smyth32k • written 3.5 years ago by John Blischak120
3
gravatar for Gordon Smyth
2.5 years ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

Charity and I were both a bit conservative with our advice in the past. We now recommend that you call voom and duplicateCorrelation twice each.

The same advice applies if you use sample quality weights, and this is now encapsulated into the voomWithQualityWeights function, see

   http://www.ncbi.nlm.nih.gov/pubmed/25925576

You could certainly do more iterations, but the advantage seems to be small.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Gordon Smyth32k
1
gravatar for Bernd Klaus
3.5 years ago by
Bernd Klaus470
Germany
Bernd Klaus470 wrote:

Hi John,

maybe I am wrong about this, but I would go for two step approach as Charity recommends.

The reasoning for this is the following: What the voom function does is essentially to provide weights for the regression fit. Now, in a situation where you want to estimate a correlation you first obtain "working weights"
that are estimated WITHOUT the correlation. You use them to estimate the correlation and then use the voom function again, this time taking the estimated correlation into account in the estimation of the  weights.

Now it can well be that it does not really matter for the computed weights whether the correlation is taken into account or not (as your example seem to indicate) so a second run of voom might not be necessary but I would always do it since it can never 'hurt'.

The worst case is that you estimate essentially the same weights twice. (the voom function returns them, so you can check!

Best wishes,

Bernd

ADD COMMENTlink modified 2.5 years ago by Gordon Smyth32k • written 3.5 years ago by Bernd Klaus470

I wonder if would it be ideal, if you didn't mind the computational burden, to repeatedly iterate voom and duplicateCorrelation to convergence? And would it even have desirable convergence properties (guaranteed convergence, convexity, etc.)?

ADD REPLYlink modified 17 months ago • written 3.5 years ago by Ryan C. Thompson6.1k

Thanks for the feedback Bernd and Ryan.

> Now it can well be that it does not really matter for the computed weights
> whether the correlation is
> taken into account or not (as your example seem to indicate) so a second
> run of voom might not be necessary but I would always do it since it can never 'hurt'.

That makes sense. Since it can only potentially improve the analysis and never hurt it, I will run voom a second time (and recommend that others do the same).

> The worst case is that you estimate essentially the same weights twice.
> (the voom function returns them, so you can check!)

Good idea. That is indeed what is happening. In my data, the weights have a median of ~4. When I compare the weights output by voom before and after incorporating the correlation, the median difference is only ~0.1.

> I wonder if would it be ideal, if you didn't mind the computational burden,
> to repeatedly iterate voom and duplicateCorrelation to convergence?
> And would it even have desirable convergence properties (guaranteed
> convergence, convexity, etc.)?

I can't answer your bigger questions, but I did try this on my data. After running the duplicateCorrelation-voom iteration 4 times, the median difference in the weights was already in the range of 10^-7. So for my data at least (corfit$consensus = 0.3433403) the successive iterations quickly stop making a difference.

John

ADD REPLYlink modified 2.5 years ago by Gordon Smyth32k • written 3.5 years ago by John Blischak120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 255 users visited in the last hour