Analyzing pre/post (longitudinal) data using limma/lme4 with adjustment for continuous covariates
2
0
Entering edit mode
DSP • 0
@dsp-9995
Last seen 7.1 years ago
CCMB

Hi all,

I’m analyzing Illumina 450K data of a study of following design

Subject    Group    Time-point Class
1    Control    Baseline 1
1    Control    Follow-up 2
2    Control    Baseline 1
2    Control    Follow-up 3
3    Intervention    Baseline 1
3    Intervention    Follow-up 2
4    Intervention    Baseline 1
4    Intervention    Follow-up 2

Individuals of a class 1 are recruited and randomly distributed to control and intervention groups.

At follow-up, individuals in both the groups now belong to two different classes: either 2 or 3

I have several variables like BCD batch, Chip and position, experiment batch, cell count estimates, age and BMI and I would like to adjust my analysis for these covariates.

There are two objectives:

  1. Identify the differentially methylated CpGs from baseline to follow-up in individuals belonging to different classes independently i.e class 1 vs 2, class 1 vs 3 and class 2 vs 3
  2. Perform an association analysis of beta values with other phenotypic information like fasting glucose or insulin

I’m attempted to use the following scripts for my analysis

## Regression analysis using lmfit limma for dmCpGs

### Unadjusted analysis
#### Pre-post:1 vs 2 analysis
targets_t2d <- read.csv("B:/IDPP3_450K_PILOT/targets.csv",check.names = FALSE)
SibShip <- factor(targets$SibShip)
Treat <- factor(targets$Treatment, levels=c("1","2"))
design <- model.matrix(~SibShip+Treat)
cleanbeta <- read.csv("B:/450K_PILOT/cleanbeta.csv",row.names=1,check.names=FALSE)
cleanbeta<-as.matrix(cleanbeta)
fit <- lmFit(cleanbeta, design)
fit <- eBayes(fit)
fit_fdr<-topTable(fit, coef="Treat2",adjust="fdr",number=416733)

## Regression analysis using lmfit limma for dmCpGs
### Adjusted for covariates
#### Pre-post:1 vs 2 analysis

targets_t2d <- read.csv("B:/450K_PILOT/targets.csv",check.names = FALSE)
SibShip <- factor(targets$SibShip)
Treat <- factor(targets$Treatment, levels=c("1","2"))
design <-model.matrix(~SibShip+Treat+targets$age+targets$BMI+targets$CD8T+targets$CD4T+targets$NK+targets$Bcell+targets$Mono+targets$Gran)
cleanbeta <- read.csv("B:/450K_PILOT/cleanbeta.csv",row.names=1,check.names=FALSE)
cleanbeta<-as.matrix(cleanbeta)
fitcelladjusted <- lmFit(cleanbeta, design)
fit_celladjusted <- eBayes(fitcelladjusted)
fit_cell_adjusted_fdr <-topTable(fit_celladjusted, coef="Treat2",adjust="fdr",number=416733)

I have the following queries:

1) Is it the right way to do the analysis for objective 1?

P.S: Since I have less sample size (~25 baseline-followup pairs per group), I choose to ignore the effect of group at this stage

2) At a later stage, how can I include the sample group information also in the model and go ahead with the analysis? I'm assuming that including time-point information is not necessary since it is reflected in the class

3) Can I use a mixed regression model like that of lmer function in lme4 package to include sample, group and time-point information as a random variables for objective 2?

4) Which model would be the right one for the above study design:

Beta,age and cell estimates as fixed effects

Subject,group as random effects, I'm not sure of timepoint if it a fixed/random effect, I'm supposing that it is a random effect variable

full.model1=lmer(Fastingglucose~beta+age+cellestimates+(1/subject)+(1/group)+(1/timepoint),REML=false)

full.model2=lmer(Fastingglucose~beta+age+cellestimates+(1/subject*timepoint)+(1/group),REML=false)

full.model3=lmer(Fastinglucose~beta+age+cellestimates+(1+beta/subject)+(1+beta/group)+(1+beta/timepoint), REML=false)

full.model4=lmer(Fastingglucose~beta+age+cellestimates+(1+beta/subject*timepoint)+(1+beta/group),REML=false)

null.model1=lmer(Fastingglucose~age+cellestimates+(1/subject)+(1/group), REML=false)

null.model2=lmer(Fastingglucose~age+cellestimates+(1/subject*timepoint)+(1/group),REML=false)

null.model3=lmer(Fastinglucose~age+cellestimates+(1+beta/subject)+(1+beta/group)+(1+beta/timepoint), REML=false)

null.model4=lmer(Fastingglucose~age+cellestimates+(1+beta/subject*timepoint)+(1+beta/group),REML=false)

Could you also please confirm if any of these above models would account for the paired nature of the sample?

I'm new to R-environment and your assistance would be of immense help.

Thanks for your attention.

Best regards,

Priyanka

 

 

illumina450k limma lme4 lmer • 3.5k views
ADD COMMENT
0
Entering edit mode

Your limma code doesn't match up to your experimental design. What is SibShip (Subject, maybe)? What is Treatment - the Class, or the Group?

ADD REPLY
0
Entering edit mode
DSP • 0
@dsp-9995
Last seen 7.1 years ago
CCMB

Hi Aaron, 

My apologies I could not give a snapshot of targets file.

You are right, Sibship, here is subject and Treatment is the class.

ADD COMMENT
0
Entering edit mode

Please add comments with the "add comment" button, not as a new answer. In any case, if Treatment represents the Class, then your code doesn't account for the fact that you have control/intervention groups. This seems like something that would be important to model, so why is it being left out?

ADD REPLY
0
Entering edit mode

Yes, but I have very few samples, around 25 pairs per group-which are sub-divided by their class. Since I'm primarily interested in class and not group, I've not included group. 

Can I directly include it in the model by:

model.matrix(~SibShip+Treat+targets$group+targets$age+targets$BMI+targets$CD8T+targets$CD4T+targets$NK+targets$Bcell+targets$Mono+targets$Gran)

or should that be considered as a random variable, if so, how can i include that in the model?

 

 

ADD REPLY
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 11 hours ago
The city by the bay

There's several odd things about your code:

  1. Most methylation analyses with limma work on the M-value rather than the beta value (see paper here). The M-value seems more normal-looking than the beta value, as the latter is bounded in [0, 1].
  2. Ignoring Group may actually result in less power, as the variance estimates would be inflated if there were systematic differences in the methylation status of samples in different groups.
  3. Blindly throwing in a whole bunch of covariates usually doesn't help. You should check, e.g., on a MDS plot whether differences in the covariate contribute to systematic differences between samples. Otherwise, you're just wasting residual d.f. for nothing. In fact, if the value of each covariate is the same for each patient before and after treatment (e.g., I would expect that age wouldn't change if the treatment is reasonably fast), then blocking on that covariate will have no effect at all as it'll be confounded with the SibShip factor.

Anyway, I would probably set up the analysis in a one-way layout:

groupings <- paste0(data$Class, data$Group)
design <- model.matrix(~ 0 + groupings)

... and then block on Subject in duplicateCorrelation. You can then directly compare between group/class combinations to identify differences between classes or between groups. You can also throw in your covariates of interest (e.g., glucose) in the design matrix above to test their effects. However, you should make sure that they're not confounded by the groupings - if, for example, all intervention samples have higher glucose levels, you won't be able to distinguish between a glucose effect and an intervention effect.

ADD COMMENT
0
Entering edit mode

Hi Aaron,

Apologies for delay in response.

1) Thanks for the correction. I have transformed my beta values to M-values for downstream analysis

2) The intervention here is a qualitative, life style intervention and the difference is captured in the outcome. Moreover, systematic differences in samples was observed according to outcome and not the group (MDS plot). So, I have decided to drop the group factor. Nevertheless, I will run the analysis with group as a factor too.

3) I have done a PCA and looked at which experimental and biological factors attribute to the variation and found the cell estimates account for variation explained by top 10 PCs. Hence I've included them in the model. Moreover, cell proportions are known to have an effect on methylation estimates, which necessitates adjustment for them.

Could you suggest a way to introduce cell counts for adjustment in the model matrix?

Thanks and best regards,

Priyanka

 

ADD REPLY

Login before adding your answer.

Traffic: 582 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6