How to get list of all GOIDs with all genes in each GO term?
Entering edit mode
Last seen 2.2 years ago

I'm working with mouse data and I want to create a dataframe with two columns: 1) the list of all mouse GOIDs 2) all genes in each GOID

Basically I want to do something like this:

go_terms_mouse <- select(, "GOID", "ENSEMBL")

I would also like to do the same thing with KEGG pathways as well.

go annotation • 312 views
Entering edit mode
Last seen 10 hours ago
United States
> z <- mapIds(, keys(, "GO"), "ENTREZID", "GO", multiVals = "list")
> lapply(z[1:4], head)
[1] "11287" "11648" "11674" "11789" "11798" "11905"

[1] "11287" "11287" "12266" "13010" "13014" "15139"

[1] "11287" "11606" "11699" "11699" "11820" "11905"

[1] "11287" "11354" "11354" "11354" "11421" "11423"

> zz <- select(, keys(, "GO"), "ENTREZID", "GO")
'select()' returned 1:many mapping between keys and columns
> head(zz)
1 GO:0002020      IBA       MF    11287
2 GO:0002020      ISO       MF    11648
3 GO:0002020      IPI       MF    11674
4 GO:0002020      IPI       MF    11789
5 GO:0002020      ISO       MF    11798
6 GO:0002020      ISO       MF    11905

I'll leave it to you to do the same for KeGG IDs

Entering edit mode

Thanks! This is just what I need. Not having any luck with performing the same KEGG conversion though. would you know what I am doing wrong here?

kegg_list <- mapIds(, keys(, "KEGG"),
                  "ENTREZID", "KEGG", multiVals = "list")

Error in testForValidKeytype(x, keytype) : 
  Invalid keytype: KEGG. Please use the keytypes method to see a listing of valid arguments.
Entering edit mode

OK, so you made a wild guess that the keytype for KeGG is 'KEGG', which is fine. But then you get an error and more importantly a message telling you how to get the available choices!

Two points to keep in mind. First, for most scripting languages and Open Source software in general, the help you get tends to be pretty terse. Which isn't to say unhelpful. Usually everything you need is there, you just have to pay attention to what is offered. Second, if you plan to use R, like at all, you need to learn how to figure things out for yourself. It's OK to ask questions, but that should be the last thing you do, after you have tried to figure things out yourself and have failed. So in the spirit of my second point, here's how you could proceed.

You got a message telling you what to do. So let's do it:

> keytypes(
[11] "GO"           "GOALL"        "IPI"          "MGI"          "ONTOLOGY"    
[16] "ONTOLOGYALL"  "PATH"         "PFAM"         "PMID"         "PROSITE"     
[21] "REFSEQ"       "SYMBOL"       "UNIGENE"      "UNIPROT"     

How did I know that's how to use keytypes? By doing ?keytypes to get the help page, as one does. But this isn't super helpful, because there's not anything like KeGG in that list. So, other alternatives. You could do ? but that just brings up a generic page about the package. You could see what functions are in the package (super useful thing to do, btw), by doing something like

> search()
 [1] ".GlobalEnv"            ""  "package:AnnotationDbi"
 [4] "package:IRanges"       "package:S4Vectors"     "package:Biobase"      
 [7] "package:BiocGenerics"  "package:parallel"      "package:stats4"       
[10] "package:BiocManager"   "ESSR"                  "package:stats"        
[13] "package:graphics"      "package:grDevices"     "package:utils"        
[16] "package:datasets"      "package:methods"       "Autoloads"            
[19] "package:base"   

## which indicates that the package is second in the search path.
## since I've been around for a while, I know about the ls() function, so here's a legit hint

> ls(2)
 [1] ""                ""            
 [3] "org.Mm.eg_dbconn"         "org.Mm.eg_dbfile"        
 [5] "org.Mm.eg_dbInfo"         "org.Mm.eg_dbschema"      
 [7] "org.Mm.egACCNUM"          "org.Mm.egACCNUM2EG"      
 [9] "org.Mm.egALIAS2EG"        "org.Mm.egCHR"            
[11] "org.Mm.egCHRLENGTHS"      "org.Mm.egCHRLOC"         
[13] "org.Mm.egCHRLOCEND"       "org.Mm.egENSEMBL"        
[15] "org.Mm.egENSEMBL2EG"      "org.Mm.egENSEMBLPROT"    
[17] "org.Mm.egENSEMBLPROT2EG"  "org.Mm.egENSEMBLTRANS"   
[19] "org.Mm.egENSEMBLTRANS2EG" "org.Mm.egENZYME"         
[21] "org.Mm.egENZYME2EG"       "org.Mm.egGENENAME"       
[23] "org.Mm.egGO"              "org.Mm.egGO2ALLEGS"      
[25] "org.Mm.egGO2EG"           "org.Mm.egMAPCOUNTS"      
[27] "org.Mm.egMGI"             "org.Mm.egMGI2EG"         
[29] "org.Mm.egORGANISM"        "org.Mm.egPATH"           
[31] "org.Mm.egPATH2EG"         "org.Mm.egPFAM"           
[33] "org.Mm.egPMID"            "org.Mm.egPMID2EG"        
[35] "org.Mm.egPROSITE"         "org.Mm.egREFSEQ"         
[37] "org.Mm.egREFSEQ2EG"       "org.Mm.egSYMBOL"         
[39] "org.Mm.egSYMBOL2EG"       "org.Mm.egUNIGENE"        
[41] "org.Mm.egUNIGENE2EG"      "org.Mm.egUNIPROT"        

And then you can look at the help pages for any of those functions, like ? or whatever, and you will see that the help page says it maps between Entrez Gene IDs and MGI symbols. So if you then went through all the likely subjects (because for sure UNIGENE and REFSEQ and UNIPROT aren't candidates), you end up with ?org.Mm.egPATH which says

org.Mm.egPATH            R Documentation

Mappings between Entrez Gene identifiers and KEGG pathway identifiers


     KEGG (Kyoto Encyclopedia of Genes and Genomes) maintains pathway
     data for various organisms. org.Mm.egPATH maps entrez gene
     identifiers to the identifiers used by KEGG for pathways

And then you have helped yourself.

Entering edit mode

I should note however that these KeGG mappings are pretty old now, because the data have been put behind a paywall. You can however get data using the KEGGREST package. Which has a vignette you can read to get up to speed.


Login before adding your answer.

Traffic: 274 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6