limma: comparison of repeat measurements - blocking or within-subject correlation?
2
0
Entering edit mode
Jane • 0
@jkhudyakov-23010
Last seen 3 months ago
United States

I am using limma to identify proteins that are differentially expressed in a tissue collected from four subjects at two different stages. I wanted to account for repeated sampling from the same individual.

I tried 1) blocking by subject (p. 43-44 of limma manual) and 2) computing within-subject correlation and including it in the model (p. 111 of limma manual). The consensus duplicate correlation for my dataset was 0.364, which seemed fairly high. I found 102 DEPs using approach 1 and 124 DEPs using approach 2, of which 101 were shared. Approach 1 identified 1 unique DEP and approach 2 found 23 unique DEPs not identified by approach 1.

Since the results are so similar, I am wondering which is the right approach? It seems that correcting for within-subject correlation slightly increases statistical power? Any insights would be much appreciated.

For reference, here is the code I used: Approach 1

design <- model.matrix(~0+Subject+Stage, data=expset)
fit <- lmFit(expset, design)
cm <- makeContrasts(LatevEarly=late-early, levels=design)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2,trend=TRUE, robust=TRUE)


Approach 2

design <- model.matrix(~0+stage, data=expset)
corfit <- duplicateCorrelation(expset,design,block=expset$Subject) fit <- lmFit(expset,design,block=subject,correlation=corfit$consensus)
cm <- makeContrasts(LatevEarly=late-early, levels=design)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2,trend=TRUE, robust=TRUE)

1
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

Without knowing the details of your experiment, it would appear that your experiment is such that both approaches are valid. Putting the block effect in the design matrix is safer when there are large differences between the blocks. The duplicateCorrelation approach is better when the blocks are very unbalanced or are confounded with treatments. However there is a large area of overlap, i.e., there are many experiments for which both approaches are valid and give similar results. In these circumstances, I tend to give preference to the design matrix approach because it is more conservative.

0
Entering edit mode

Thank you, Gordon! I will go with the more conservative approach.

0
Entering edit mode
@mikhaelmanurung-17423
Last seen 9 weeks ago
Netherlands

Using duplicateCorrelation would be the better choice compared to blocking. Are you analysing microarray or RNA-Seq data? If it is RNA-Seq then you are missing a few steps prior to duplicateCorrelation. See this post as a reference https://support.bioconductor.org/p/59700/.

Note that duplicateCorrelation` should be calculated twice as in https://support.bioconductor.org/p/114663/.

0
Entering edit mode

I am analyzing shotgun proteomics data and using log2-transformed protein abundance values.