Question

Discrepanices between minifi dmp finder and lmfit

0

Entering edit mode

Andreas Paul • 0

@andreas-paul-11140

Last seen 7.4 years ago

Dear all,

currently Im working on methylation analysis of Illumina 450k beadchip data using the minfi workflow for identifying differentially methylated CpG sites. The design consist of 55 samples and we have different groups of covariates and one group for example has 3 levels. When applying dmp finder we obtain different significantly methylated CpG sites compared to limfit by limma. I read that the minfi dmpfinder is a wrapper around lmfit. But why are the results so different then? Appreciate any help and hints for a solution.

Phenotype data

ID	Ethnicity	Age	GroupSurgery
1	CEU	13	1
2	CEU	25	1
3	ASN	45	2
4	CEU	55	3
5	CEU	67	3

Command dmp finder:

dmps.cat <- dmpFinder(m, pheno=phenotype_data$GroupSurgery, type="categorical")

Command limma

group1 <- factor(phenotype_data$GroupSurgery, levels=c("1","2","3"))

design <- model.matrix(~group1)
fit <- lmFit(m,design)
fit <- eBayes(fit)
topTable(fit, coef=group1)

Results dmp finder

cpg	intercept	f	pval	qval
cg14829516	0.320442	28.70813	3.980690e-09	0.001369007
cg10099799	-2.407821	22.26578	1.034389e-07	0.016183809
cg20036732	-1.819726	21.48785	1.578203e-07	0.016183809
cg14587604	-3.087692	21.16708	1.882320e-07	0.016183809
cg07435215	1.033033	19.85411	3.921676e-07	0.026974223
cg01638741	2.113079	19.25624	5.516534e-07	0.027544259

Results lmfit

CpG	X.Intercept	group12	group13	AveExpr	F	pval	adjPv
cg18560014	3.610993	-0.019218780	-0.01794563	3.600508	25462.76	6.652259e-89	1.515816e-83
cg06862343	3.526721	-0.006593399	0.04692367	3.537600	24921.88	1.224412e-88	1.515816e-83
cg16427743	3.134849	0.036864198	0.02867473	3.153394	24867.83	1.302324e-88	1.515816e-83
cg19973604	3.080761	0.063361587	0.04370996	3.111115	24611.21	1.748740e-88	1.515816e-83
cg07704672	3.457744	-0.012866272	-0.02403017	3.447448	24604.27	1.762806e-88	1.515816e-83
cg26673073	3.554504	-0.024529503	0.01566954	3.551641	24258.88	2.634272e-88	1.887649e-83

illimina 450k methylation minfi limma • 1.8k views

ADD COMMENT • link 7.8 years ago • updated 7.7 years ago Andreas Paul • 0

score 3 · Accepted Answer · 2016-07-26

3

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

Two reasons. First, dmpFinder is smart enough to do an F-test that doesn't include the intercept (which estimates the mean of surgery group 1), whereas you did an F-test that does include this intercept. That doesn't make any sense; the F-test is testing the null hypothesis that all coefficients are equal to zero, and there is no expectation that the mean M-value for the surgery group 1 should be zero. The correct test would be

topTable(fit, 2:3)

Second, dmpFinder by default doesn't do the empirical Bayes step, whereas you are doing that when using limma directly. With 50 samples or so, I would be surprised to see any materially different results between the two, but there may be some small differences. But the main problem is that you are doing a nonsensical F-test when you use limma directly.

ADD COMMENT • link 7.8 years ago James W. MacDonald 65k

0

Entering edit mode

I think I got it now and understand the main differences between the dmpfinder and limma. But still wondering if I can perform pairwise comparisons such as 1-2, 2-3, 3-1? Do I need to create a contrast matrix
or does setting the coefficients in toptable works as well? Further I want to add age and ethnicitiy into the analysis. How can I set this to get the results for all the coefficients together? Thank you very much!

ADD REPLY • link 7.8 years ago Andreas Paul • 0

1

Entering edit mode

You will get further, faster by simply reading the limma User's Guide, which covers all these questions and more. If you are planning to use R and Bioconductor to analyze data, you will have to become proficient at consulting the vignettes and help pages that come with the packages you are intending to use. While this support site is an invaluable resource, it is not intended as a substitute for information that is already readily available elsewhere.

ADD REPLY • link 7.7 years ago James W. MacDonald 65k