mapIds to get GO terms
2
1
Entering edit mode
sisterdot ▴ 10
@sisterdot-9374
Last seen 6.2 years ago

hey!

i am trying to annotate a dataset. i got gene symbols and want to provide GO terms.

the following is rather slow, i am sure there is a better way :-)

library("AnnotationDbi")
library("GO.db")
library("plyr")
library("org.Mm.eg.db")

genes <- c("Kit","Gata1","Cnga4","E130309D14Rik","Apol7c","Neil2","Atp7b","H2-Bl","Gli1","Glt6d1","Grem2","Fstl3","Il10rb","Ccdc71","Rho","Nt5dc1","Obox5","4931428F04Rik","Rbbp6","4922502N22Rik","Gbf1","Plp1","Gm7030","Slc6a20b","Bpifb6")

getgoterms <- function(x) {
x<-na.omit(x)
if( length(x) > 0){
terms<-toString(eapply(GOTERM[x], Term))
} else{return("NA")}
}

goterms <- mapIds(org.Mm.eg.db,
keys=genes,
column="GO",
keytype="SYMBOL",
multiVals=getgoterms)

goterms

thanks a lot!

sisterdot

annotationdbi go.db • 1.3k views
2
Entering edit mode
@martin-morgan-1513
Last seen 7 days ago
United States

I did the two maps

sym2go <- select(org.Mm.eg.db, genes, "GO", "SYMBOL")
go2term <- select(GO.db, sym2go\$GO, "TERM", "GOID")

then merged

sym2term <- merge(sym2go, go2term, by.x="GO", by.y="GOID")

with

> head(sym2term)
GO SYMBOL EVIDENCE ONTOLOGY
1 GO:0000009 Glt6d1      IEA       MF
2 GO:0000026 Glt6d1      IEA       MF
3 GO:0000030 Glt6d1      IEA       MF
4 GO:0000033 Glt6d1      IEA       MF
5 GO:0000122  Gata1      IDA       BP
6 GO:0000122  Gata1      IDA       BP
TERM
1                               alpha-1,6-mannosyltransferase activity
2                               alpha-1,2-mannosyltransferase activity
3                                         mannosyltransferase activity
4                               alpha-1,3-mannosyltransferase activity
5 negative regulation of transcription from RNA polymerase II promoter
6 negative regulation of transcription from RNA polymerase II promoter



With

with(sym2term, splitAsList(TERM, SYMBOL))[genes]

generating a 1:1 alignment between the original symbols and the several terms each is annotated with.

1
Entering edit mode
@james-w-macdonald-5106
Last seen 38 minutes ago
United States

You might be able to speed it up somewhat, but I doubt there is a lot of speed to be gained. Just mapping symbols to GO IDs is 'slow'.

> system.time(mapIds(org.Mm.eg.db, genes, "GO", "SYMBOL", multiVals="list"))
user  system elapsed
1.840   0.152   1.997

This is because there is a lot of work being done under the hood, in order to get things returned in the correct manner. If you really need speed, you can always attach the org.Mm.eg.db and GO.db SQLite databases and do a direct SQL query. That would most likely be much faster.

But this will also take more of your time, and given that annotating things is not usually a repetitive task, I wonder if it's really a burden to wait a couple of seconds to get your results. You could annotate the entire set of genes on the org.Mm.eg.db package in under 8 minutes

> system.time(mapIds(org.Mm.eg.db, Rkeys(org.Mm.egSYMBOL), "GO","SYMBOL", multiVals = getgoterms))
user  system elapsed
445.172   0.968 447.696

which would be less time than it would take to figure out how to attach the two dbs and do the query.