Question: mapIds to get GO terms
1
gravatar for sisterdot
3.6 years ago by
sisterdot10
sisterdot10 wrote:

hey!

i am trying to annotate a dataset. i got gene symbols and want to provide GO terms.

the following is rather slow, i am sure there is a better way :-)

 

library("AnnotationDbi")
library("GO.db")
library("plyr")
library("org.Mm.eg.db")

genes <- c("Kit","Gata1","Cnga4","E130309D14Rik","Apol7c","Neil2","Atp7b","H2-Bl","Gli1","Glt6d1","Grem2","Fstl3","Il10rb","Ccdc71","Rho","Nt5dc1","Obox5","4931428F04Rik","Rbbp6","4922502N22Rik","Gbf1","Plp1","Gm7030","Slc6a20b","Bpifb6")

getgoterms <- function(x) {
 x<-na.omit(x)
 if( length(x) > 0){
 terms<-toString(eapply(GOTERM[x], Term))
 } else{return("NA")}
}

goterms <- mapIds(org.Mm.eg.db,
                     keys=genes,
                     column="GO",
                     keytype="SYMBOL",
                     multiVals=getgoterms)

goterms

 

thanks a lot!

sisterdot

annotationdbi go.db • 942 views
ADD COMMENTlink modified 3.6 years ago by Martin Morgan ♦♦ 23k • written 3.6 years ago by sisterdot10
Answer: mapIds to get GO terms
2
gravatar for Martin Morgan
3.6 years ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

I did the two maps

sym2go <- select(org.Mm.eg.db, genes, "GO", "SYMBOL")
go2term <- select(GO.db, sym2go$GO, "TERM", "GOID")

then merged

sym2term <- merge(sym2go, go2term, by.x="GO", by.y="GOID")

with

> head(sym2term)
          GO SYMBOL EVIDENCE ONTOLOGY
1 GO:0000009 Glt6d1      IEA       MF
2 GO:0000026 Glt6d1      IEA       MF
3 GO:0000030 Glt6d1      IEA       MF
4 GO:0000033 Glt6d1      IEA       MF
5 GO:0000122  Gata1      IDA       BP
6 GO:0000122  Gata1      IDA       BP
                                                                  TERM
1                               alpha-1,6-mannosyltransferase activity
2                               alpha-1,2-mannosyltransferase activity
3                                         mannosyltransferase activity
4                               alpha-1,3-mannosyltransferase activity
5 negative regulation of transcription from RNA polymerase II promoter
6 negative regulation of transcription from RNA polymerase II promoter

With

with(sym2term, splitAsList(TERM, SYMBOL))[genes]

generating a 1:1 alignment between the original symbols and the several terms each is annotated with.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Martin Morgan ♦♦ 23k
Answer: mapIds to get GO terms
1
gravatar for James W. MacDonald
3.6 years ago by
United States
James W. MacDonald50k wrote:

You might be able to speed it up somewhat, but I doubt there is a lot of speed to be gained. Just mapping symbols to GO IDs is 'slow'.

> system.time(mapIds(org.Mm.eg.db, genes, "GO", "SYMBOL", multiVals="list"))
   user  system elapsed
  1.840   0.152   1.997

This is because there is a lot of work being done under the hood, in order to get things returned in the correct manner. If you really need speed, you can always attach the org.Mm.eg.db and GO.db SQLite databases and do a direct SQL query. That would most likely be much faster.

But this will also take more of your time, and given that annotating things is not usually a repetitive task, I wonder if it's really a burden to wait a couple of seconds to get your results. You could annotate the entire set of genes on the org.Mm.eg.db package in under 8 minutes

> system.time(mapIds(org.Mm.eg.db, Rkeys(org.Mm.egSYMBOL), "GO","SYMBOL", multiVals = getgoterms))
   user  system elapsed
445.172   0.968 447.696

which would be less time than it would take to figure out how to attach the two dbs and do the query.

ADD COMMENTlink written 3.6 years ago by James W. MacDonald50k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 294 users visited in the last hour