Hi all,

I’m analyzing Illumina 450K data of a study of following design

Subject Group Time-point Class

1 Control Baseline 1

1 Control Follow-up 2

2 Control Baseline 1

2 Control Follow-up 3

3 Intervention Baseline 1

3 Intervention Follow-up 2

4 Intervention Baseline 1

4 Intervention Follow-up 2

**Individuals of a class 1 are recruited and randomly distributed to control and intervention groups.**

**At follow-up, individuals in both the groups now belong to two different classes: either 2 or 3**

I have several variables like BCD batch, Chip and position, experiment batch, cell count estimates, age and BMI and I would like to adjust my analysis for these covariates.

There are two objectives:

**Identify the differentially methylated CpGs from baseline to follow-up in individuals belonging to different classes independently i.e class 1 vs 2, class 1 vs 3 and class 2 vs 3****Perform an association analysis of beta values with other phenotypic information like fasting glucose or insulin**

I’m attempted to use the following scripts for my analysis

## Regression analysis using lmfit limma for dmCpGs

### Unadjusted analysis

#### Pre-post:1 vs 2 analysis

targets_t2d <- read.csv("B:/IDPP3_450K_PILOT/targets.csv",check.names = FALSE)

SibShip <- factor(targets$SibShip)

Treat <- factor(targets$Treatment, levels=c("1","2"))

design <- model.matrix(~SibShip+Treat)

cleanbeta <- read.csv("B:/450K_PILOT/cleanbeta.csv",row.names=1,check.names=FALSE)

cleanbeta<-as.matrix(cleanbeta)

fit <- lmFit(cleanbeta, design)

fit <- eBayes(fit)

fit_fdr<-topTable(fit, coef="Treat2",adjust="fdr",number=416733)

## Regression analysis using lmfit limma for dmCpGs

### Adjusted for covariates

#### Pre-post:1 vs 2 analysis

targets_t2d <- read.csv("B:/450K_PILOT/targets.csv",check.names = FALSE)

SibShip <- factor(targets$SibShip)

Treat <- factor(targets$Treatment, levels=c("1","2"))

design <-model.matrix(~SibShip+Treat+targets$age+targets$BMI+targets$CD8T+targets$CD4T+targets$NK+targets$Bcell+targets$Mono+targets$Gran)

cleanbeta <- read.csv("B:/450K_PILOT/cleanbeta.csv",row.names=1,check.names=FALSE)

cleanbeta<-as.matrix(cleanbeta)

fitcelladjusted <- lmFit(cleanbeta, design)

fit_celladjusted <- eBayes(fitcelladjusted)

fit_cell_adjusted_fdr <-topTable(fit_celladjusted, coef="Treat2",adjust="fdr",number=416733)

**I have the following queries:**

1) Is it the right way to do the analysis for objective 1?

P.S: Since I have less sample size (~25 baseline-followup pairs per group), I choose to ignore the effect of group at this stage

**2) At a later stage, how can I include the sample group information also in the model and go ahead with the analysis? I'm assuming that including time-point information is not necessary since it is reflected in the class**

3) Can I use a mixed regression model like that of lmer function in lme4 package to include sample, group and time-point information as a random variables for objective 2?

4) Which model would be the right one for the above study design:

**Beta,age and cell estimates as fixed effects**

**Subject,group as random effects, I'm not sure of timepoint if it a fixed/random effect, I'm supposing that it is a random effect variable**

full.model1=lmer(Fastingglucose~beta+age+cellestimates+(1/subject)+(1/group)+(1/timepoint),REML=false)

full.model2=lmer(Fastingglucose~beta+age+cellestimates+(1/subject*timepoint)+(1/group),REML=false)

full.model3=lmer(Fastinglucose~beta+age+cellestimates+(1+beta/subject)+(1+beta/group)+(1+beta/timepoint), REML=false)

full.model4=lmer(Fastingglucose~beta+age+cellestimates+(1+beta/subject*timepoint)+(1+beta/group),REML=false)

null.model1=lmer(Fastingglucose~age+cellestimates+(1/subject)+(1/group), REML=false)

null.model2=lmer(Fastingglucose~age+cellestimates+(1/subject*timepoint)+(1/group),REML=false)

null.model3=lmer(Fastinglucose~age+cellestimates+(1+beta/subject)+(1+beta/group)+(1+beta/timepoint), REML=false)

null.model4=lmer(Fastingglucose~age+cellestimates+(1+beta/subject*timepoint)+(1+beta/group),REML=false)

**Could you also please confirm if any of these above models would account for the paired nature of the sample?**

I'm new to R-environment and your assistance would be of immense help.

Thanks for your attention.

Best regards,

Priyanka

Your

limmacode doesn't match up to your experimental design. What is`SibShip`

(Subject, maybe)? What is`Treatment`

- the Class, or the Group?