Question: Duplicate Correlation with technical replicates
0
gravatar for dsperley
2.9 years ago by
dsperley0
United States
dsperley0 wrote:

I'm using Limma to analyze Illumina 450k methylation data. I'm comparing the methylation (M-Values) in Obese vs Lean subjects, and I have a total of 49 arrays, representing 3 Lean subjects and 11 Obese subjects. Each subject is represented by 3-5 arrays,with the exception of one Lean subject that is represented only by one array. Is it appropriate to use the  duplicateCorrelation function when some biological replicates have no technical replication? I would like to keep that array in the analysis since there are so few Lean subjects in the study. 

Design setup and code:

 head(sample_pheno,n=10)
   Subject Condition
1        1     Obese
2        1     Obese
3        1     Obese
4        1     Obese
5        1     Obese
6        2      Lean
7        2      Lean
8        2      Lean
9        2      Lean
10       2      Lean


#Design setup
Condition<-factor(sample_pheno$Condition)
design<-model.matrix(~0+Condition)
colnames(design)<-levels(Condition)
head(design)

  Lean Obese
1    0     1
2    0     1
3    0     1
4    0     1
5    0     1
6    1     0

#calculate correlation within subjects
corfit<-duplicateCorrelation(M_Val,design,block=sample_pheno$Subject)
fit<-lmFit(M_Val,design,block=sample_pheno$Subject,correlation=corfit$consensus.correlation)

 

limma duplicatecorrelation • 851 views
ADD COMMENTlink modified 2.9 years ago by Ryan C. Thompson7.3k • written 2.9 years ago by dsperley0
Answer: Duplicate Correlation with technical replicates
1
gravatar for Ryan C. Thompson
2.9 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.3k wrote:

As far as I know, it's fine to use duplicateCorrelation on such a data set. I've done it before. Just keep in mind that any samples with no technical duplicates will not contribute to the the estimate of inter-duplicate correlation, since there is nothing for them to correlate with.

Side note: for 450k M-values, I recommend having a look at the mean-variance trend. From my experience, it typically has a concave shape, with lower variance in the middle and higher variance for extreme values. Using eBayes with trend=TRUE would be recommended for such data.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Ryan C. Thompson7.3k
1

Actually, the estimate of the consensus correlation is not completely unaffected by the presence of samples with no technical duplicates. This is because the former is calculated after fitting a mixed linear model (with statmod::mixedModel2Fit, if anyone's interested), in which the latter will still contribute to the estimated values of the coefficients. For example:

a <- matrix(rnorm(11000), ncol=11)
design <- model.matrix(~factor(rep(1:2,c(5,6))))
grouping <- factor(c(rep(1:5, 2), 6)) # Last sample has no friend.
dc <- duplicateCorrelation(a, design, block=grouping)
dc$consensus # Something near zero
dc2 <- duplicateCorrelation(a[,-11], design[-11,], block=grouping[-11])
dc2$consensus # Something also near zero but not exactly the same.

That said, it's not really a problem, and I'll imagine they'll eventually converge due to improved estimation with an increasing number of levels of your blocking factor. On a related note, the other important thing with duplicateCorrelation is that it does better when you have samples across a large number of levels of the blocking factor. With 14 subjects, I think you'll be fine.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Aaron Lun24k

Thanks for clarifying. As you say, adding samples will always affect the model fit, and I didn't mean to imply that it wouldn't. What I meant (but didn't say clearly) is that the estimate of inter-duplicate correlation will not be substantially aided or hindered by the inclusion of non-duplicated subjects. They don't add much information, nor do they take any information away.

ADD REPLYlink written 2.9 years ago by Ryan C. Thompson7.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 132 users visited in the last hour