I have a question regarding paired analysis in the Limma package. I am analyzing Illumina methylation arrays of paired tumor samples from two different sites. I came up with the following script to analyze the data, which I put together from old scripts I found in our lab. But there is something that bothers me, I do not understand why the design matrix has an intercept that is forced to 0. From a lecture on linear models I remembered that forcing the intercept to 0 is bad practice in many cases, but I'm not sure if I'm missing the point here? The results are substantially different between r~0+Location+Pairs
and r~Location+Pairs
.
# this is the factor of interest
Location <- factor(pd$Sample_Group, levels = c("M","P"))
# the individual effect, each pair is a subject
Pairs <- factor(pd$Pair)
# design matrix
design <- model.matrix(~0+Location+Pairs, data=pd)
colnames(design) <- c(levels(Location ),levels(Pairs)[-1])
# fit the linear model
fit <- lmFit(Beta_To_M(myCombatLimma), design)
# create a contrast matrix for specific comparisons
contMatrix <- makeContrasts(P-M, levels=design)
# fit the contrasts, estimate sample-variance, compute moderated t-statistics
fit2 <- contrasts.fit(fit, contMatrix)
fit2 <- eBayes(fit2)
# get the table of results for the first contrast (Primary - Met)
PairWise_DMPs <- topTable(fit2,
num=Inf,
coef=1,
genelist=ann450kSub,
adjust.method = "BH",
p.value = 0.05)
Gordon Smyth I'm sorry to reawaken this thread, but I am still confused about the parametrization of the design matrix for paired analysis in Limma. Is my code above correct for the analysis of the pairwise difference between primary (P) and metastasis (M)? One thing that bugs me, is that by adding the 0 in my design matrix (
~0+Location+Pairs
), I lose the column of my first pair. I'm still not totally convinced that I'm doing the right thing, especially since I found a plethora of slight variation of how to do the pairwise analysis that I got confused.Your design matrix and contrast is fine.
On the other hand, running Combat to remove a batch effect and then feeding the results into limma as if the data hasn't been corrected is both wrong and unnecessary. (If that is what you are doing -- I am reacting the data object called Combat.)
Thank you very much for the clarification! Yes, indeed, that was my intention. I am using ComBat as it is part of the ChAMP pipeline to correct for the batch effect of the bead chips, since our data was collected over a rather long period of time. What exactly is the problem with using Limma on ComBat-corrected beta values? As far as I know, ChAMP also uses Limma's linear model framework for the analysis of differentially methylated probes. I decided to use Limma directly since ChAMP does not allow pairwise design.
Batch correction is almost certainly unnecessary for a paired design because it is corrected for by the pairing.
There are many posts on this forum about the problems of naive use of Combat followed by limma, and also published papers:
Thanks a lot for your help and advice!!!