Hi,
I would like to use the duplicateCorrelation function along with limma and voom in order to analyze some RNA-seq data with blocking and group-means parametrization (similar to example 9.7 in the limma manual). The manual does not contain a similar example using the voom function for RNA-seq data. I searched the Bioconductor mailing list and BioStars and found conflicting advice.
The first approach performs voom only once, prior to running duplicateCorrelation, e.g. this Bioc post
https://stat.ethz.ch/pipermail/bioconductor/2013-July/054087.html
and this Biostars post
https://www.biostars.org/p/54565
The second approach performs voom twice, both before and after running duplicateCorrelation, e.g. this Bioc post
https://stat.ethz.ch/pipermail/bioconductor/attachments/20130530/4dcc9475/attachment.pl
and this Biostars post
https://www.biostars.org/p/96889
Is there a consensus on which is the better method? Does the choice depend on the study design?
In both my actual data and in the simulated example below, I have found the results to be very similar. Thus I am currently leaning towards using the single voom method since it achieves similar results with fewer steps. The simulated example is not very interesting because blocking does not have much effect in the first place (corfit$consensus = -0.002984798), as expected from random data. However, in my data set it does make a difference (corfit$consensus = 0.3433403) and the results of using single or double voom are still similar.
Thanks,
John
library("limma") library("edgeR") # Simulate data set.seed(123) counts <- matrix(rpois(9000, lambda = 500), ncol = 90) rownames(counts) <- paste0("g_", 1:nrow(counts)) anno <- data.frame(block = rep(paste0("block", 1:6), each = 15), fac1 = rep(c("A", "B", "C"), 30), fac2 = rep(paste0("treat", 1:5), 18)) combined_fac <- factor(paste(anno$fac1, anno$fac2, sep = ".")) design <- model.matrix(~0 + combined_fac) colnames(design) <- levels(combined_fac) # Single voom y <- DGEList(counts) y <- calcNormFactors(y) v <- voom(y, design) corfit <- duplicateCorrelation(v, design, block = anno$block) fit <- lmFit(v, design, block = anno$block, correlation = corfit$consensus) cont_matrix <- makeContrasts(AvB_1 = A.treat1 - B.treat1, AvB_2 = A.treat2 - B.treat2, levels = design) fit2 <- contrasts.fit(fit, cont_matrix) fit2 <- eBayes(fit2) result_single <- topTable(fit2, number = nrow(counts), sort.by = "none") # Double voom y <- DGEList(counts) y <- calcNormFactors(y) v <- voom(y, design) corfit <- duplicateCorrelation(v, design, block = anno$block) v <- voom(y, design, block = anno$block, correlation = corfit$consensus) fit <- lmFit(v, design, block = anno$block, correlation = corfit$consensus) cont_matrix <- makeContrasts(AvB_1 = A.treat1 - B.treat1, AvB_2 = A.treat2 - B.treat2, levels = design) fit2 <- contrasts.fit(fit, cont_matrix) fit2 <- eBayes(fit2) result_double <- topTable(fit2, number = nrow(counts), sort.by = "none") # No blocking y <- DGEList(counts) y <- calcNormFactors(y) v <- voom(y, design) fit <- lmFit(v, design) cont_matrix <- makeContrasts(AvB_1 = A.treat1 - B.treat1, AvB_2 = A.treat2 - B.treat2, levels = design) fit2 <- contrasts.fit(fit, cont_matrix) fit2 <- eBayes(fit2) result_no_block <- topTable(fit2, number = nrow(counts), sort.by = "none") # Comparison - all correlations > 0.99 cor(result_single$adj.P.Val, result_double$adj.P.Val) cor(result_single$adj.P.Val, result_no_block$adj.P.Val) cor(result_double$adj.P.Val, result_no_block$adj.P.Val)
I wonder if would it be ideal, if you didn't mind the computational burden, to repeatedly iterate voom and duplicateCorrelation to convergence? And would it even have desirable convergence properties (guaranteed convergence, convexity, etc.)?
Thanks for the feedback Bernd and Ryan.
> Now it can well be that it does not really matter for the computed weights
> whether the correlation is
> taken into account or not (as your example seem to indicate) so a second
> run of voom might not be necessary but I would always do it since it can never 'hurt'.
That makes sense. Since it can only potentially improve the analysis and never hurt it, I will run voom a second time (and recommend that others do the same).
> The worst case is that you estimate essentially the same weights twice.
> (the voom function returns them, so you can check!)
Good idea. That is indeed what is happening. In my data, the weights have a median of ~4. When I compare the weights output by voom before and after incorporating the correlation, the median difference is only ~0.1.
> I wonder if would it be ideal, if you didn't mind the computational burden,
> to repeatedly iterate voom and duplicateCorrelation to convergence?
> And would it even have desirable convergence properties (guaranteed
> convergence, convexity, etc.)?
I can't answer your bigger questions, but I did try this on my data. After running the duplicateCorrelation-voom iteration 4 times, the median difference in the weights was already in the range of 10^-7. So for my data at least (corfit$consensus = 0.3433403) the successive iterations quickly stop making a difference.
John
I don't see an advantage in doing that. The weights and the the block correlation only need to be in right ball park. There is no advantage in computing them to 16 significant figures.