I have performed gometh on a 450k methylation dataset and extracted the top 20 categories using topGO. My question is, is there a way to generate a list of the gene names in the returned "DE" column, for examples the 71 genes reported in the below example?
Term Ont N DE
GO:0098742: cell-cell adhesion via plasma-membrane adhesion molecules BP 214 71
Using the example data from ?gometh as an example:
> example(gometh)
<snip>
> topGO(gst,number=1)
Term Ont
GO:0007156 homophilic cell adhesion via plasma membrane adhesion molecules BP
N DE P.DE FDR
GO:0007156 154 37 1.310866e-12 2.768287e-08
Now we have to map CpGs to Entrez Gene IDs
> z <- getMappedEntrezIDs(sigcpgs, allcpgs, "450K")
This gives us a list, where the first list item contains all the Entrez Gene IDs that correspond to significant CpGs. We can then map to GO terms and subset.
> gos <- select(org.Hs.eg.db, z[[1]], "GOALL")
> firstgos <- subset(gos, GOALL %in% "GO:0007156")
> firstgos$SYMBOL <- mapIds(org.Hs.eg.db, as.character(firstgos$ENTREZID), "SYMBOL","ENTREZID")
'select()' returned 1:1 mapping between keys and columns
> firstgos
ENTREZID GOALL EVIDENCEALL ONTOLOGYALL SYMBOL
136 1002 GO:0007156 IEA BP CDH4
27086 22883 GO:0007156 IEA BP CLSTN1
27440 22997 GO:0007156 IEA BP IGSF9B
35396 26025 GO:0007156 IEA BP PCDHGA12
80493 56099 GO:0007156 IEA BP PCDHGB7
80514 56100 GO:0007156 IEA BP PCDHGB6
80538 56101 GO:0007156 IEA BP PCDHGB5
80568 56102 GO:0007156 IEA BP PCDHGB3
80589 56103 GO:0007156 IEA BP PCDHGB2
80610 56104 GO:0007156 IEA BP PCDHGB1
80637 56105 GO:0007156 IEA BP PCDHGA11
80658 56106 GO:0007156 IEA BP PCDHGA10
80683 56107 GO:0007156 IEA BP PCDHGA9
80707 56108 GO:0007156 IEA BP PCDHGA7
80728 56109 GO:0007156 IEA BP PCDHGA6
80749 56110 GO:0007156 IEA BP PCDHGA5
80770 56111 GO:0007156 IEA BP PCDHGA4
80791 56112 GO:0007156 IEA BP PCDHGA3
80812 56113 GO:0007156 IEA BP PCDHGA2
80833 56114 GO:0007156 IEA BP PCDHGA1
80856 56135 GO:0007156 IEA BP PCDHAC1
80890 56136 GO:0007156 IEA BP PCDHA13
80911 56137 GO:0007156 IEA BP PCDHA12
80934 56138 GO:0007156 IEA BP PCDHA11
80972 56139 GO:0007156 IEA BP PCDHA10
81008 56140 GO:0007156 IEA BP PCDHA8
81044 56141 GO:0007156 IEA BP PCDHA7
81082 56142 GO:0007156 IEA BP PCDHA6
81118 56143 GO:0007156 IEA BP PCDHA5
81154 56144 GO:0007156 IEA BP PCDHA4
81190 56145 GO:0007156 IEA BP PCDHA3
81232 56146 GO:0007156 IEA BP PCDHA2
81285 56147 GO:0007156 IEA BP PCDHA1
83848 57463 GO:0007156 ISS BP AMIGO1
117026 8641 GO:0007156 IEA BP PCDHGB4
125152 9708 GO:0007156 IEA BP PCDHGA8
125456 9752 GO:0007156 IEA BP PCDHA9
This was incredibly helpful for me, thank you.
How would you perform the same type of extraction for genes in the topKEGG pathways?
Thank you in advance!
I figured it out, posting here for anyone that might find it useful:
> keggs <- select(org.Hs.eg.db, z[[1]], "PATH")
> firstkeggs <- subset(keggs, PATH %in% "04614")
> firstkeggs$SYMBOL <- mapIds(org.Hs.eg.db, as.character(firstkeggs$ENTREZID), "SYMBOL","ENTREZID")