Dear all,
currently Im working on methylation analysis of Illumina 450k beadchip data using the minfi workflow for identifying differentially methylated CpG sites. The design consist of 55 samples and we have different groups of covariates and one group for example has 3 levels. When applying dmp finder we obtain different significantly methylated CpG sites compared to limfit by limma. I read that the minfi dmpfinder is a wrapper around lmfit. But why are the results so different then? Appreciate any help and hints for a solution.
Phenotype data
ID | Ethnicity | Age | GroupSurgery |
1 | CEU | 13 | 1 |
2 | CEU | 25 | 1 |
3 | ASN | 45 | 2 |
4 | CEU | 55 | 3 |
5 | CEU | 67 | 3 |
Command dmp finder:
dmps.cat <- dmpFinder(m, pheno=phenotype_data$GroupSurgery, type="categorical")
Command limma
group1 <- factor(phenotype_data$GroupSurgery, levels=c("1","2","3"))
design <- model.matrix(~group1)
fit <- lmFit(m,design)
fit <- eBayes(fit)
topTable(fit, coef=group1)
Results dmp finder
cpg | intercept | f | pval | qval |
cg14829516 | 0.320442 | 28.70813 | 3.980690e-09 | 0.001369007 |
cg10099799 | -2.407821 | 22.26578 | 1.034389e-07 | 0.016183809 |
cg20036732 | -1.819726 | 21.48785 | 1.578203e-07 | 0.016183809 |
cg14587604 | -3.087692 | 21.16708 | 1.882320e-07 | 0.016183809 |
cg07435215 | 1.033033 | 19.85411 | 3.921676e-07 | 0.026974223 |
cg01638741 | 2.113079 | 19.25624 | 5.516534e-07 | 0.027544259 |
Results lmfit
CpG | X.Intercept | group12 | group13 | AveExpr | F | pval | adjPv |
cg18560014 | 3.610993 | -0.019218780 | -0.01794563 | 3.600508 | 25462.76 | 6.652259e-89 | 1.515816e-83 |
cg06862343 | 3.526721 | -0.006593399 | 0.04692367 | 3.537600 | 24921.88 | 1.224412e-88 | 1.515816e-83 |
cg16427743 | 3.134849 | 0.036864198 | 0.02867473 | 3.153394 | 24867.83 | 1.302324e-88 | 1.515816e-83 |
cg19973604 | 3.080761 | 0.063361587 | 0.04370996 | 3.111115 | 24611.21 | 1.748740e-88 | 1.515816e-83 |
cg07704672 | 3.457744 | -0.012866272 | -0.02403017 | 3.447448 | 24604.27 | 1.762806e-88 | 1.515816e-83 |
cg26673073 | 3.554504 | -0.024529503 | 0.01566954 | 3.551641 | 24258.88 | 2.634272e-88 | 1.887649e-83 |
I think I got it now and understand the main differences between the dmpfinder and limma. But still wondering if I can perform pairwise comparisons such as 1-2, 2-3, 3-1? Do I need to create a contrast matrix
or does setting the coefficients in toptable works as well? Further I want to add age and ethnicitiy into the analysis. How can I set this to get the results for all the coefficients together? Thank you very much!
You will get further, faster by simply reading the limma User's Guide, which covers all these questions and more. If you are planning to use R and Bioconductor to analyze data, you will have to become proficient at consulting the vignettes and help pages that come with the packages you are intending to use. While this support site is an invaluable resource, it is not intended as a substitute for information that is already readily available elsewhere.