I am using limma's lmFit to find differentially methylated probes between cases and control subjects. And to correct for covariates, for example, age of the subjects.
But my question is more general, related to the lmFit object. All the design examples I've seen contain the phenotype of interest in the design, along with a number of covariates to correct for. But how can the biological signal be conserved for further steps (such as eBayes and topTable) if the phenotype of interest is corrected away?
An example of meta-data for the design would be:
sampleName, caseOrControl, age
sample1, "Case", 39
sample2, "Control", 56
Where caseOrControl is my phenotype of interest (diseased or healthy) and age a covariate I want to correct for.
Example of my data:
probe, sample1, sample2
probeA, 0.97, 1.24
probeB, 1.45, 2.20
...
An example command to find if one of my probes is differentially methylated between cases and controls could be:
fit1 <- lmFit(object=data, design=~ caseOrControl + age) ## design contains case/control and age
bayesFit <- eBayes(fit1)
fitResults <- topTable(bayesFit, coef=caseOrControl, number=Inf, adjust.method="BH", p.value=1)
From my understanding of lmFit, the three lines above provide me with the probability of each probe being differentially methylated between cases and controls, adjusting for age.
For my work, however, I need to have (for plotting and further processing) the matrix of values just corrected for age.
What I am doing now to obtain it, is doing another fit excluding the phenotype (case or control) from the design, and saving the residuals matrix.
fit2 <- lmFit(object=data, design=~ age) ## design contains only the covariate age
correctedValues2 <- residuals.MArrayLM(object=fit2, y=data) ## values corrected for age
Is there a way to obtain the matrix above from the first fit1 object described above? ( fit1 <- lmFit(object=data, design=~ caseOrControl + age) )
If I am not misunderstood, the lines:
fit1 <- lmFit(object=data, design=~ caseOrControl + age)
correctedValues1 <- residuals.MArrayLM(object=fit1, y=data)
would give me the values corrected for age *and* case/control, which in my mind, somehow erases the biological signal (corrects away the difference between cases and controls).
On the other hand, if the correction above (with a design including case/control status) erases or muddles the biological signal, then how could eBayes and topTable then extract that signal?...
I've been unable to find a good, clear example on the topic, so I hope that refining this rough question, and maybe receive pointers to a good example, may help other beginner users in the future.
Hi Gordon, what if a covariate (which I think including confounders) is repeated measure?