Dear Gordon and all,

I'm analyzing Illumina 450K methylation data with following study design:

Study design: I have individuals at a given time-point from three different stages: Stage 1, 2 and 3 and I wish to identify differentially methylated CpGs in these groups. Stage 1 vs Stage 2; Stage 1 vs Stage 3 and Stage 2 vs Stage 3

I have preprocessed my data and adjusted for batch effects. Following that, I have done PCA and regression to identify potential covariates and found blood cell estimates account for variation explained by top 10 PCs.

### I've used the following code for unadjusted analysis:

Treat_outcome <- factor(Targets_Unapired$Outcome)

design_outcome_unadjusted <- model.matrix(~0+Treat_outcome)

colnames(design_outcome_unadjusted) <- levels(Treat_outcome)

cm_outcome_unadjusted <- makeContrasts(NGTvsIGT = NGT-IGT,T2DvsIGT = T2D-IGT, NGTvsT2D = T2D-NGT, levels = design_outcome_unadjusted)

fit_outcome_unadjusted <- lmFit(Mval, design_outcome_unadjusted)

fit_outcome_final_unadjusted <- contrasts.fit(fit_outcome_unadjusted, cm_outcome_unadjusted)

fit_outcome_final_eBayes_unadjusted <- eBayes(fit_outcome_final_unadjusted)

NGTvsIGT_unadjusted_BH <- topTable(fit_outcome_final_eBayes,coef="NGTvsIGT",adjust="BH",number=100)

T2DvsIGT_unadjusted_BH <- topTable(fit_outcome_final_eBayes,coef="T2DvsIGT",adjust="BH",number=100)

NGTvsT2D_unadjusted_BH <- topTable(fit_outcome_final_eBayes,coef="NGTvsT2D",adjust="BH",number=100)

The above code worked fine. However when I try to introduce the cell estimates as covariates as in the following code, I get the following errors:

design_outcome_cellcounts <- model.matrix(~0+Treat_outcome+Targets_Unpaired$CD8T+Targets_Unpaired$CD4T+Targets_Unpaired$NK+Targets_Unpaired$Bcell+Targets_Unpaired$Mono+Targets_Unpaired$Gran)

cm_outcome_cellcounts <- makeContrasts(NGTvsIGT = NGT-IGT,T2DvsIGT = T2D-IGT, NGTvsT2D = T2D-NGT, levels = levels(Treat_outcome))

fit_outcome_cellcounts <- lmFit(Mval, design_outcome_cellcounts)

fit_outcome_final_cellcounts <- contrasts.fit(fit_outcome_cellcounts, cm_outcome_cellcounts)

fit_outcome_final_eBayes <- eBayes(fit_outcome_final_cellcounts)

**## Error**

**Error in contrasts.fit(fit_outcome_cellcounts, cm_outcome_cellcounts) :
Number of rows of contrast matrix must match number of coefficients
In addition: Warning message:
In contrasts.fit(fit_outcome_cellcounts, cm_outcome_cellcounts) :
row names of contrasts don't match col names of coefficients**

Could you please suggest a way out?

Thanks in advance.

Regards,

Priyanka

Also I need to the same adjustments for another study design, where I have data from same individual at two timepoints. Baseline samples are all at stage 1 who either reach Stage 2 or Stage 3 at follow up timepoint.

I wish to capture differentially methylated CpGs resulting in transition from Stage 1 to Stage2 and Stage 1 to Stage 3 and look for any overlap.

As suggested by Aaron a while ago, I wish to go ahead with duplicate correlation function for the same. But I need to adjust these results for cell counts.

As a side note: Since you are analyzing methylation M-values, I recommend running

`eBayes`

with`trend=TRUE`

. Due to the logistic transformation, extreme M-values tend to have higher variance than M-values near the middle of the range, so a plot of variance vs mean for all probes will look somewhat U-shaped. Running`eBayes`

with`trend=TRUE`

will model this mean-variance dependence and should give you better results.