I am using limma to identify proteins that are differentially expressed in a tissue collected from four subjects at two different stages. I wanted to account for repeated sampling from the same individual.
I tried 1) blocking by subject (p. 43-44 of limma manual) and 2) computing within-subject correlation and including it in the model (p. 111 of limma manual). The consensus duplicate correlation for my dataset was 0.364, which seemed fairly high. I found 102 DEPs using approach 1 and 124 DEPs using approach 2, of which 101 were shared. Approach 1 identified 1 unique DEP and approach 2 found 23 unique DEPs not identified by approach 1.
Since the results are so similar, I am wondering which is the right approach? It seems that correcting for within-subject correlation slightly increases statistical power? Any insights would be much appreciated.
For reference, here is the code I used: Approach 1
design <- model.matrix(~0+Subject+Stage, data=expset)
fit <- lmFit(expset, design)
cm <- makeContrasts(LatevEarly=late-early, levels=design)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2,trend=TRUE, robust=TRUE)
topTable(fitC2, adjust="BH", p.value = 0.05)
Approach 2
design <- model.matrix(~0+stage, data=expset)
corfit <- duplicateCorrelation(expset,design,block=expset$Subject)
fit <- lmFit(expset,design,block=subject,correlation=corfit$consensus)
cm <- makeContrasts(LatevEarly=late-early, levels=design)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2,trend=TRUE, robust=TRUE)
topTable(fit2, adjust="BH", p.value = 0.05)
Thank you, Gordon! I will go with the more conservative approach.