Question

[How to construct a model.matrix ]

0

Entering edit mode

kyj2226 • 0

@909e7f9c

Last seen 12 months ago

South Korea

Dear Bioconductor community,

I have a question about limma

First, let me explain about my data.

T0 (the starting point of medication) and T4 (the time point when DNA methylation data was obtained from blood samples) are separated by a span of two years.

We coded individuals who took lithium for two years as 'cases' and those who did not as 'controls,' assigning each group an Age_group value of 1 or 0, respectively.

[data] result (data.frame)

enter image description here

The goal of this analysis is to examine whether lithium treatment could reduce the Epigenetic Age predicted from DNA methylation data.

The covariates included in the model are Age, Sex, PC1, PC2, PC3, PC4, PC5, B, CD4T, CD8T, Mono, NK, and Neutro (cell type proportions).

Initially, I simply used


lm(Epigenetic_age ~ Age_group + Age + Sex + PC1 + PC2 + PC3 + PC4 + PC5 + B + CD4T + CD8T + Mono + NK + Neutro)

to observe the association.

However, to achieve a more accurate model, I implemented model.matrix directly based on the guidelines from the following source: https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html#treating-factors-that-are-not-of-direct-interest-as-random-effects.

I intended to include individual as a random effect ,therefore, I grouped the samples based on Group) Age_group_Time

enter image description here

and created the model.matrix accordingly


design <- model.matrix( ~ 0 + Group + AGE + SEX + BMI + ALCOHOL_USE + SMOKER + BD_Type + batch + Acute_Stable + Center + PC1 + PC2 + PC3 + PC4 + PC5 + B + NK + CD4T + CD8T + Mono + Neutro , data = result)

And then, I ran the lmfit and set up a contrast matrix for comparison.

fit<- lmFit(object = result$Epigenetic_Age, design, block = Subject, correlation = cor$consensus.correlation)

contrasts_Eigenetic_Age <- makeContrasts(
  R_T4vsT0=Group1_T4-Group1_T0,
  NR_T4vsT0=Group0_T4-Group0_T0,
  RvsNR=(Group1_T4-Group1_T0)-(Group0_T4-Group0_T0),
  levels = colnames(design)
)


fit_PCGrimAge <- contrasts.fit(fit, contrasts_Epigenetic_Age)
fit_PCGrimAge$coefficients

Currently, I can only obtain coefficients as output so that I tried to further analysis with eBayes() to get P-value but I thought it is only for DE genes. (In my case I only have one (Epigenetic_Age))

Am I right? I'm not sure if it is right*

Any advice would be greatly appreciated.

Thank you.

Best

Yujin

limma modelmatrix • 1.4k views

ADD COMMENT • link updated 13 months ago by Gordon Smyth 53k • written 13 months ago by kyj2226 • 0

score 1 · Answer 1 · 2024-11-13

This isn't the right place for your question, as it's more about how to do the analysis rather than how to use the software (the focus of this site). In general, you should ask questions like this over on biostars.org.

That said, I think your question is mis-specified. You can use methylation array data along with any of a number of methylation clocks to estimate the epigenetic age of each subject. Once you have done so, the methylation data has served its purpose (you computed the epigenetic age), and now you can fit a model comparing the epigenetic age of the treated and untreated (including age, sex, bmi, etc as additional coefficients) to see if the epigenetic age is different in treated vs control. But at that point it's just one model, and you wouldn't use limma for that, but instead you would use lm directly.

score 1 · Answer 2 · 2024-11-14

Initially, I simply used lm(Epigenetic_age ~ Age_group + Age + Sex + PC1 + PC2 + PC3 + PC4 + PC5 + B + CD4T + CD8T + Mono + NK + Neutro)

That's fine but it ignores correlations between repeat measurements on the same subject. You need to use the lme4 package to estimate the random effect.

to achieve a more accurate model, I implemented model.matrix directly

No, that makes no difference. Linear models work exactly the same whether you use model.matrix or specify a formula to lm.

based on the guidelines from RNAseq123/../designmatrices.html#treating-factors-that-are-not-of-direct-interest-as-random-effects

That would work if you were analysing gene expression data, but you cannot estimate random efffects in limma if you only have univariate data. limma will simply set the intra-subject correlation to zero, as you must already know if you ran the code.

I intended to include individual as a random effect ,therefore, I grouped the samples based on Group

Forming a group factor has nothing to do with random effects. limma can use exactly the same design matrix as for lm.

Currently, I can only obtain coefficients as output

No, again that is wrong. limma will give p-values even for just one row of data. But there is no advantage in using limma for univariate data. You should be using lme4.