Dear Gordon and all,
I'm analyzing Illumina 450K methylation data with following study design:
Study design: I have individuals at a given time-point from three different stages: Stage 1, 2 and 3 and I wish to identify differentially methylated CpGs in these groups. Stage 1 vs Stage 2; Stage 1 vs Stage 3 and Stage 2 vs Stage 3
I have preprocessed my data and adjusted for batch effects. Following that, I have done PCA and regression to identify potential covariates and found blood cell estimates account for variation explained by top 10 PCs.
### I've used the following code for unadjusted analysis:
Treat_outcome <- factor(Targets_Unapired$Outcome)
design_outcome_unadjusted <- model.matrix(~0+Treat_outcome)
colnames(design_outcome_unadjusted) <- levels(Treat_outcome)
cm_outcome_unadjusted <- makeContrasts(NGTvsIGT = NGT-IGT,T2DvsIGT = T2D-IGT, NGTvsT2D = T2D-NGT, levels = design_outcome_unadjusted)
fit_outcome_unadjusted <- lmFit(Mval, design_outcome_unadjusted)
fit_outcome_final_unadjusted <- contrasts.fit(fit_outcome_unadjusted, cm_outcome_unadjusted)
fit_outcome_final_eBayes_unadjusted <- eBayes(fit_outcome_final_unadjusted)
NGTvsIGT_unadjusted_BH <- topTable(fit_outcome_final_eBayes,coef="NGTvsIGT",adjust="BH",number=100)
T2DvsIGT_unadjusted_BH <- topTable(fit_outcome_final_eBayes,coef="T2DvsIGT",adjust="BH",number=100)
NGTvsT2D_unadjusted_BH <- topTable(fit_outcome_final_eBayes,coef="NGTvsT2D",adjust="BH",number=100)
The above code worked fine. However when I try to introduce the cell estimates as covariates as in the following code, I get the following errors:
design_outcome_cellcounts <- model.matrix(~0+Treat_outcome+Targets_Unpaired$CD8T+Targets_Unpaired$CD4T+Targets_Unpaired$NK+Targets_Unpaired$Bcell+Targets_Unpaired$Mono+Targets_Unpaired$Gran)
cm_outcome_cellcounts <- makeContrasts(NGTvsIGT = NGT-IGT,T2DvsIGT = T2D-IGT, NGTvsT2D = T2D-NGT, levels = levels(Treat_outcome))
fit_outcome_cellcounts <- lmFit(Mval, design_outcome_cellcounts)
fit_outcome_final_cellcounts <- contrasts.fit(fit_outcome_cellcounts, cm_outcome_cellcounts)
fit_outcome_final_eBayes <- eBayes(fit_outcome_final_cellcounts)
## Error
Error in contrasts.fit(fit_outcome_cellcounts, cm_outcome_cellcounts) :
Number of rows of contrast matrix must match number of coefficients
In addition: Warning message:
In contrasts.fit(fit_outcome_cellcounts, cm_outcome_cellcounts) :
row names of contrasts don't match col names of coefficients
Could you please suggest a way out?
Thanks in advance.
Regards,
Priyanka
Also I need to the same adjustments for another study design, where I have data from same individual at two timepoints. Baseline samples are all at stage 1 who either reach Stage 2 or Stage 3 at follow up timepoint.
I wish to capture differentially methylated CpGs resulting in transition from Stage 1 to Stage2 and Stage 1 to Stage 3 and look for any overlap.
As suggested by Aaron a while ago, I wish to go ahead with duplicate correlation function for the same. But I need to adjust these results for cell counts.
As a side note: Since you are analyzing methylation M-values, I recommend running
eBayes
withtrend=TRUE
. Due to the logistic transformation, extreme M-values tend to have higher variance than M-values near the middle of the range, so a plot of variance vs mean for all probes will look somewhat U-shaped. RunningeBayes
withtrend=TRUE
will model this mean-variance dependence and should give you better results.