Question

Analyzing pre/post (longitudinal) data using limma/lme4 with adjustment for continuous covariates

0

Entering edit mode

DSP • 0

@dsp-9995

Last seen 6.4 years ago

CCMB

Hi all,

I’m analyzing Illumina 450K data of a study of following design

Subject   Group   Time-point Class
1   Control   Baseline 1
1   Control   Follow-up 2
2   Control   Baseline 1
2   Control   Follow-up 3
3   Intervention   Baseline 1
3   Intervention   Follow-up 2
4   Intervention   Baseline 1
4   Intervention   Follow-up 2

Individuals of a class 1 are recruited and randomly distributed to control and intervention groups.

At follow-up, individuals in both the groups now belong to two different classes: either 2 or 3

I have several variables like BCD batch, Chip and position, experiment batch, cell count estimates, age and BMI and I would like to adjust my analysis for these covariates.

There are two objectives:

Identify the differentially methylated CpGs from baseline to follow-up in individuals belonging to different classes independently i.e class 1 vs 2, class 1 vs 3 and class 2 vs 3
Perform an association analysis of beta values with other phenotypic information like fasting glucose or insulin

I’m attempted to use the following scripts for my analysis

## Regression analysis using lmfit limma for dmCpGs

### Unadjusted analysis
#### Pre-post:1 vs 2 analysis
targets_t2d <- read.csv("B:/IDPP3_450K_PILOT/targets.csv",check.names = FALSE)
SibShip <- factor(targets$SibShip)
Treat <- factor(targets$Treatment, levels=c("1","2"))
design <- model.matrix(~SibShip+Treat)
cleanbeta <- read.csv("B:/450K_PILOT/cleanbeta.csv",row.names=1,check.names=FALSE)
cleanbeta<-as.matrix(cleanbeta)
fit <- lmFit(cleanbeta, design)
fit <- eBayes(fit)
fit_fdr<-topTable(fit, coef="Treat2",adjust="fdr",number=416733)

## Regression analysis using lmfit limma for dmCpGs
### Adjusted for covariates
#### Pre-post:1 vs 2 analysis

targets_t2d <- read.csv("B:/450K_PILOT/targets.csv",check.names = FALSE)
SibShip <- factor(targets$SibShip)
Treat <- factor(targets$Treatment, levels=c("1","2"))
design <-model.matrix(~SibShip+Treat+targets$age+targets$BMI+targets$CD8T+targets$CD4T+targets$NK+targets$Bcell+targets$Mono+targets$Gran)
cleanbeta <- read.csv("B:/450K_PILOT/cleanbeta.csv",row.names=1,check.names=FALSE)
cleanbeta<-as.matrix(cleanbeta)
fitcelladjusted <- lmFit(cleanbeta, design)
fit_celladjusted <- eBayes(fitcelladjusted)
fit_cell_adjusted_fdr <-topTable(fit_celladjusted, coef="Treat2",adjust="fdr",number=416733)

I have the following queries:

1) Is it the right way to do the analysis for objective 1?

P.S: Since I have less sample size (~25 baseline-followup pairs per group), I choose to ignore the effect of group at this stage

2) At a later stage, how can I include the sample group information also in the model and go ahead with the analysis? I'm assuming that including time-point information is not necessary since it is reflected in the class

3) Can I use a mixed regression model like that of lmer function in lme4 package to include sample, group and time-point information as a random variables for objective 2?

4) Which model would be the right one for the above study design:

Beta,age and cell estimates as fixed effects

Subject,group as random effects, I'm not sure of timepoint if it a fixed/random effect, I'm supposing that it is a random effect variable

full.model1=lmer(Fastingglucose~beta+age+cellestimates+(1/subject)+(1/group)+(1/timepoint),REML=false)

full.model2=lmer(Fastingglucose~beta+age+cellestimates+(1/subject*timepoint)+(1/group),REML=false)

full.model3=lmer(Fastinglucose~beta+age+cellestimates+(1+beta/subject)+(1+beta/group)+(1+beta/timepoint), REML=false)

full.model4=lmer(Fastingglucose~beta+age+cellestimates+(1+beta/subject*timepoint)+(1+beta/group),REML=false)

null.model1=lmer(Fastingglucose~age+cellestimates+(1/subject)+(1/group), REML=false)

null.model2=lmer(Fastingglucose~age+cellestimates+(1/subject*timepoint)+(1/group),REML=false)

null.model3=lmer(Fastinglucose~age+cellestimates+(1+beta/subject)+(1+beta/group)+(1+beta/timepoint), REML=false)

null.model4=lmer(Fastingglucose~age+cellestimates+(1+beta/subject*timepoint)+(1+beta/group),REML=false)

Could you also please confirm if any of these above models would account for the paired nature of the sample?

I'm new to R-environment and your assistance would be of immense help.

Thanks for your attention.

Best regards,

Priyanka

illumina450k limma lme4 lmer • 3.2k views

ADD COMMENT • link updated 8.0 years ago by Aaron Lun ★ 28k • written 8.0 years ago by DSP • 0

0

Entering edit mode

Your limma code doesn't match up to your experimental design. What is SibShip (Subject, maybe)? What is Treatment - the Class, or the Group?

ADD REPLY • link 8.0 years ago Aaron Lun ★ 28k

score 0 · Answer 1 · 2016-04-26

0

Entering edit mode

DSP • 0

@dsp-9995

Last seen 6.4 years ago

CCMB

Hi Aaron,

My apologies I could not give a snapshot of targets file.

You are right, Sibship, here is subject and Treatment is the class.

ADD COMMENT • link 8.0 years ago DSP • 0

0

Entering edit mode

Please add comments with the "add comment" button, not as a new answer. In any case, if Treatment represents the Class, then your code doesn't account for the fact that you have control/intervention groups. This seems like something that would be important to model, so why is it being left out?

ADD REPLY • link 8.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Yes, but I have very few samples, around 25 pairs per group-which are sub-divided by their class. Since I'm primarily interested in class and not group, I've not included group.

Can I directly include it in the model by:

model.matrix(~SibShip+Treat+targets$group+targets$age+targets$BMI+targets$CD8T+targets$CD4T+targets$NK+targets$Bcell+targets$Mono+targets$Gran)

or should that be considered as a random variable, if so, how can i include that in the model?

ADD REPLY • link 8.0 years ago DSP • 0

score 0 · Answer 2 · 2016-04-26

There's several odd things about your code:

Most methylation analyses with limma work on the M-value rather than the beta value (see paper here). The M-value seems more normal-looking than the beta value, as the latter is bounded in [0, 1].
Ignoring Group may actually result in less power, as the variance estimates would be inflated if there were systematic differences in the methylation status of samples in different groups.
Blindly throwing in a whole bunch of covariates usually doesn't help. You should check, e.g., on a MDS plot whether differences in the covariate contribute to systematic differences between samples. Otherwise, you're just wasting residual d.f. for nothing. In fact, if the value of each covariate is the same for each patient before and after treatment (e.g., I would expect that age wouldn't change if the treatment is reasonably fast), then blocking on that covariate will have no effect at all as it'll be confounded with the SibShip factor.

Anyway, I would probably set up the analysis in a one-way layout:

groupings <- paste0(data$Class, data$Group)
design <- model.matrix(~ 0 + groupings)

... and then block on Subject in duplicateCorrelation. You can then directly compare between group/class combinations to identify differences between classes or between groups. You can also throw in your covariates of interest (e.g., glucose) in the design matrix above to test their effects. However, you should make sure that they're not confounded by the groupings - if, for example, all intervention samples have higher glucose levels, you won't be able to distinguish between a glucose effect and an intervention effect.