mapIds to get GO terms
2
1
Entering edit mode
sisterdot ▴ 10
@sisterdot-9374
Last seen 8.2 years ago

hey!

i am trying to annotate a dataset. i got gene symbols and want to provide GO terms.

the following is rather slow, i am sure there is a better way :-)

 

library("AnnotationDbi")
library("GO.db")
library("plyr")
library("org.Mm.eg.db")

genes <- c("Kit","Gata1","Cnga4","E130309D14Rik","Apol7c","Neil2","Atp7b","H2-Bl","Gli1","Glt6d1","Grem2","Fstl3","Il10rb","Ccdc71","Rho","Nt5dc1","Obox5","4931428F04Rik","Rbbp6","4922502N22Rik","Gbf1","Plp1","Gm7030","Slc6a20b","Bpifb6")

getgoterms <- function(x) {
 x<-na.omit(x)
 if( length(x) > 0){
 terms<-toString(eapply(GOTERM[x], Term))
 } else{return("NA")}
}

goterms <- mapIds(org.Mm.eg.db,
                     keys=genes,
                     column="GO",
                     keytype="SYMBOL",
                     multiVals=getgoterms)

goterms

 

thanks a lot!

sisterdot

annotationdbi go.db • 1.8k views
ADD COMMENT
2
Entering edit mode
@martin-morgan-1513
Last seen 3 days ago
United States

I did the two maps

sym2go <- select(org.Mm.eg.db, genes, "GO", "SYMBOL")
go2term <- select(GO.db, sym2go$GO, "TERM", "GOID")

then merged

sym2term <- merge(sym2go, go2term, by.x="GO", by.y="GOID")

with

> head(sym2term)
          GO SYMBOL EVIDENCE ONTOLOGY
1 GO:0000009 Glt6d1      IEA       MF
2 GO:0000026 Glt6d1      IEA       MF
3 GO:0000030 Glt6d1      IEA       MF
4 GO:0000033 Glt6d1      IEA       MF
5 GO:0000122  Gata1      IDA       BP
6 GO:0000122  Gata1      IDA       BP
                                                                  TERM
1                               alpha-1,6-mannosyltransferase activity
2                               alpha-1,2-mannosyltransferase activity
3                                         mannosyltransferase activity
4                               alpha-1,3-mannosyltransferase activity
5 negative regulation of transcription from RNA polymerase II promoter
6 negative regulation of transcription from RNA polymerase II promoter

With

with(sym2term, splitAsList(TERM, SYMBOL))[genes]

generating a 1:1 alignment between the original symbols and the several terms each is annotated with.

ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

You might be able to speed it up somewhat, but I doubt there is a lot of speed to be gained. Just mapping symbols to GO IDs is 'slow'.

> system.time(mapIds(org.Mm.eg.db, genes, "GO", "SYMBOL", multiVals="list"))
   user  system elapsed
  1.840   0.152   1.997

This is because there is a lot of work being done under the hood, in order to get things returned in the correct manner. If you really need speed, you can always attach the org.Mm.eg.db and GO.db SQLite databases and do a direct SQL query. That would most likely be much faster.

But this will also take more of your time, and given that annotating things is not usually a repetitive task, I wonder if it's really a burden to wait a couple of seconds to get your results. You could annotate the entire set of genes on the org.Mm.eg.db package in under 8 minutes

> system.time(mapIds(org.Mm.eg.db, Rkeys(org.Mm.egSYMBOL), "GO","SYMBOL", multiVals = getgoterms))
   user  system elapsed
445.172   0.968 447.696

which would be less time than it would take to figure out how to attach the two dbs and do the query.

ADD COMMENT

Login before adding your answer.

Traffic: 838 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6