Dear all,
I use the missMethyl package with the gometh and gsameth function to analyze whether significantly differentially methylated CpG sites are enriched in gene sets.
gometh can either analyze GO or KEGG genesets, whereas gsameth is a more generalized version where one can manually specify the genesets to be analyzed. In my analysis I used both gometh and gsameth with the KEGG genesets; for gometh KEGG can be specified automatically; whereas for gsameth I downloaded the KEGG genesets from the Broadinstitute via http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C2. I perform analysis with the following code:
library(devtools) install_github("Bioconductor-mirror/missMethyl") library(BAGS) library(missMethyl) #read C2 CP gmt file downlaoded from http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C2 MSigDB_C2_CP <- ReadGMT("./data/c2.cp.v5.1.entrez.gmt") #only keep KEGG pathways MSigDB_C2_CP_KEGG <- MSigDB_C2_CP[grepl('^KEGG', names(MSigDB_C2_CP))] #perform gsameth gsa.MSigDB <- gsameth(sig.cpg=rownames(sigDMP), all.cpg=rownames(GRset_work), collection=MSigDB_C2_CP_KEGG) Warning message: In alias2SymbolTable(flat$symbol) : Multiple symbols ignored for one or more aliases #sort by FDR p-value & only keep <0.05 gsa.MSigDB <- as.data.frame(gsa.MSigDB[order(gsa.MSigDB[,4]),]) sigMSigDB <- gsa.MSigDB[gsa.MSigDB$FDR<0.05,] #perform gometh gst.KEGG <- gometh(sig.cpg=rownames(sigDMP), all.cpg=rownames(GRset_work), collection="KEGG") Warning message: In alias2SymbolTable(flat$symbol) : Multiple symbols ignored for one or more aliases #sort by FDR p-value & only keep <0.05 gst.KEGG <- gst.KEGG[order(gst.KEGG$FDR),] sigKEGG <- gst.KEGG[gst.KEGG$FDR<0.05,] #results: sprintf("gsameth with C2 Curated Pathways of KEGG: %s / %s",NROW(sigMSigDB), NROW(gsa.MSigDB)) [1] "gsameth with C2 Curated Pathways of KEGG: 2 / 186" sprintf("gometh with KEGG: %s / %s",NROW(sigKEGG), NROW(gst.KEGG)) [1] "gometh with KEGG: 133 / 301"
First, I wonder why the number of total genesets differs: 186 if I manually download the KEGG genesets from the broad institute for gsameth and 301 for gometh if KEGG is specified as collection.
Second, I wonder why the number of significant genesets identified by both methods is so different: 2 for gsameth and 133 for gometh?
Third, what can I do about the warning "Multiple symbols ignored for one or more aliases" - is this worrisome or just a result that not all CpG sites can be mapped to a specific gene?
Thanks,
Philipp