Question: Duplicate Correlation with technical replicates
0
2.9 years ago by
dsperley0
United States
dsperley0 wrote:

I'm using Limma to analyze Illumina 450k methylation data. I'm comparing the methylation (M-Values) in Obese vs Lean subjects, and I have a total of 49 arrays, representing 3 Lean subjects and 11 Obese subjects. Each subject is represented by 3-5 arrays,with the exception of one Lean subject that is represented only by one array. Is it appropriate to use the  duplicateCorrelation function when some biological replicates have no technical replication? I would like to keep that array in the analysis since there are so few Lean subjects in the study.

Design setup and code:

 head(sample_pheno,n=10)
Subject Condition
1        1     Obese
2        1     Obese
3        1     Obese
4        1     Obese
5        1     Obese
6        2      Lean
7        2      Lean
8        2      Lean
9        2      Lean
10       2      Lean

#Design setup
Condition<-factor(sample_pheno$Condition) design<-model.matrix(~0+Condition) colnames(design)<-levels(Condition) head(design) Lean Obese 1 0 1 2 0 1 3 0 1 4 0 1 5 0 1 6 1 0 #calculate correlation within subjects corfit<-duplicateCorrelation(M_Val,design,block=sample_pheno$Subject)
fit<-lmFit(M_Val,design,block=sample_pheno$Subject,correlation=corfit$consensus.correlation)

limma duplicatecorrelation • 851 views
modified 2.9 years ago by Ryan C. Thompson7.3k • written 2.9 years ago by dsperley0
Answer: Duplicate Correlation with technical replicates
1
2.9 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.3k wrote:

As far as I know, it's fine to use duplicateCorrelation on such a data set. I've done it before. Just keep in mind that any samples with no technical duplicates will not contribute to the the estimate of inter-duplicate correlation, since there is nothing for them to correlate with.

Side note: for 450k M-values, I recommend having a look at the mean-variance trend. From my experience, it typically has a concave shape, with lower variance in the middle and higher variance for extreme values. Using eBayes with trend=TRUE would be recommended for such data.

1

Actually, the estimate of the consensus correlation is not completely unaffected by the presence of samples with no technical duplicates. This is because the former is calculated after fitting a mixed linear model (with statmod::mixedModel2Fit, if anyone's interested), in which the latter will still contribute to the estimated values of the coefficients. For example:

a <- matrix(rnorm(11000), ncol=11)
design <- model.matrix(~factor(rep(1:2,c(5,6))))
grouping <- factor(c(rep(1:5, 2), 6)) # Last sample has no friend.
dc <- duplicateCorrelation(a, design, block=grouping)
dc$consensus # Something near zero dc2 <- duplicateCorrelation(a[,-11], design[-11,], block=grouping[-11]) dc2$consensus # Something also near zero but not exactly the same.

That said, it's not really a problem, and I'll imagine they'll eventually converge due to improved estimation with an increasing number of levels of your blocking factor. On a related note, the other important thing with duplicateCorrelation is that it does better when you have samples across a large number of levels of the blocking factor. With 14 subjects, I think you'll be fine.