Pathway analysis of differentially methylated CpGs
Entering edit mode
philipp24 ▴ 30
Last seen 6.7 years ago

Dear all,

I have 450k methylation data from 120 samples. I perform differential methylation analysis (using the m-values) for a interval scaled variable (current_phenotype with levels of 1,2,3) with limma which identifies 1336 hypo- and 635 hypermethylated CpG´s.The significant CpG´s (FDR<0.05 & logFC > 2) are then used for an analysis of differentially methylated pathways using the missMethyl package (gometh function). Although I get several differential methylated pathways I have troubles with interpreting the results. Specifically, how do I know whether higher values of "current_phenotpye" are associated with increased or decreased pathway methylation? Moreover, does it make sense to subject both significant hyper- and hypermethylated CpGs to the pathway analysis (since the gometh function does only know which CpG is significant, however not the direction i.e. whether it is hypo- or hypermethylated)?

Thanks for your help,



design2 = model.matrix(~current_phenotype)
fit2 = lmFit(m_values, design2)  
keep <- fit2$Amean > median(fit2$Amean)
fitEb <- eBayes(fit2[keep,], robust=T, trend=T)

   (Intercept) current_phenotype
-1         812              1336
0        20713            204111
1       184557               635

tt <- topTable(fitEb,coef=2,"p", p.value=0.05, lfc=2, adjust.method="BH",number=Inf)
gst.KEGG <- gometh(sig.cpg=rownames(tt), all.cpg=rownames(m_values), collection="KEGG", prior.prob = T)
gst.KEGG <- gst.KEGG[order(gst.KEGG$FDR),]
gst.KEGG  <- gst.KEGG[gst.KEGG$FDR<0.05,]
limma missmethyl • 2.5k views
Entering edit mode
Last seen 1 hour ago
WEHI, Melbourne, Australia

Regarding your first question, genes with positive logFC have increased methylation.

Regarding your second question, I agree with you that this is an issue. I personally very much prefer to do enrichment analyses for up and down genes separately and, for that reason, all the pathway enrichment analysis functions in the limma package show direction of change for each pathway. However you will see a lot of published articles in which KEGG and GO analyses are done for all DE genes pooled together.

You could easily run gometh() separately for up and down genes (using tt$logFC>0 to select up genes). Alternatively, running limma's kegga function directly would give you exactly what you want. This takes a little bit of care because you have to take into account the number of CpG islands corresponding to each gene. You could do that like this:

ann <- missMethyl:::.flattenAnn()
m <- match(rownames(fitEb), ann$cpg)
EntrezID <- ann$entrezid[m]
n <- table(EntrezID)
k <- kegga(fitEb, geneid=EntrezID, covariate=n[EntrezID])

This would conduct a KEGG enrichment analysis showing the direction of change for each pathway. A gene ontology analysis would work the same way:

g <- goana(fitEb, geneid=EntrezID, covariate=n[EntrezID])

Other comments on your analysis

Using trend=TRUE in eBayes() doesn't make any sense with M-values. Nor does filtering on fit$Amean, because M-values are not intensities. I think you are trying to combine code from different contexts instead of following the missMethyl users guide. If you want to do this sort of thing, then you need to store intensities as well as M-values, for example by:

RG <- list(R=meth, G=unmeth)
MA <- MA.RG(RG, offset=100)
fit <- lmFit(MA, design)
fit <- eBayes(MA, trend=TRUE, robust=TRUE)

Also, please don't combine a logFC cutoff with limma's FDR. If you want to give more emphasis to large fold changes, then use treat() in place of eBayes().

Entering edit mode
philipp24 ▴ 30
Last seen 6.7 years ago

Dear Gordon,

Thank you for your helpful and detailed answer.

Regarding #1: I´ve implemented your suggestion which works fine and indeed gives me a list with separate p-values for "up" and "down" for each pathway. However, what is the interpretation of the results if both up and down are significant for a specific pathway (which is the case for about 20% of the significant pathways)?

Regarding #2: Would you agree that the identification of differentially methylated CpGs based on M-values would be correct if I use eBayes without the robust=TRUE argument?



Entering edit mode

Dear Philipp,

Could you please use the "ADD COMMENT" button if you want to respond to my answer, rather than posting further questions as an Answer? Otherwise the whole dialog will get messed up on the webpage.

The KEGG pathways are not perfectly directional. A gene will be included in a KEGG pathway if it is functionally involved in the operation of the pathway. However this does not necessarily mean that the gene must be up regulated when the pathway is in operation. In some cases, the gene could be down-regulated.

Even if a pathway is up-regulated in your experiment, some of the genes positively associated with the pathway could be down-regulated in your experiment because of cross-talk with other pathways.

When you judge whether a pathways is up or down regulated, you should only be taking notice of very small p-values, preferably less than 1e-6. Don't take any notice of p-values above 0.001 say.

The eBayes() analysis will be valid whether or not you set robust=TRUE, although that option will often increase power, see: . However you must not use trend=TRUE unless your data contains information about absolute intensity.


Login before adding your answer.

Traffic: 198 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6