The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Retrieving all genes related a go term
1
gravatar for Stane
17 months ago by
Stane40
Stane40 wrote:

Hello, 

I am trying to improve the way that I am retrieving genes related to a go term, getting something closer to amigo/quickgo (are they the most up to date ?)
I have two simple ways so far: 

    AnnotationDbi::select(org.Hs.eg.db,
                              keys=go_key,
                              columns = c('SYMBOL'),
                              keytype = "GOALL")

and 

    ensembl <- biomaRt::useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
    biomaRt::getBM(attributes=c('hgnc_symbol', 'ensembl_transcript_id', 'go_id', 'go_linkage_type'), filters = 'go', values = go_key, mart = ensembl)

I kind of like Annotationdbi way better as I have the db on my computer and therefore is it fast, no need to wait for internet request/response. 

My first problem is they return slightly different gene list which are also slightly different compare to amigo/quickgo.

Do you know with one is the more updated ? Also I am not sure the GOALL is retrieving all the child and child of child ... maybe I should experiment with recursive method. 

go biomart annotationdbi R • 422 views
ADD COMMENTlink modified 17 months ago by James W. MacDonald49k • written 17 months ago by Stane40
Answer: Retrieving all genes related a go term
3
gravatar for James W. MacDonald
17 months ago by
United States
James W. MacDonald49k wrote:

The GO.db package is built twice a year, so at best the one you have is from April. The data at the Biomart server is updated, um, more often? I presume there is some information somewhere that would tell you, or perhaps Mike Smith knows. I sort of assumed that they updated more regularly, but now that I think about it I am not sure why I would think that.

I would assume that AmiGO is the most up to date, given that it's their data.

But as you have noted, the OrgDb is the fastest way to go, and also is nice because it's constant (and hence reproducible). There is always a tension between getting the newest information and being able to reproduce results, and people have to decide for themselves what is most important.

Also, you don't have to guess at GOALL, you can just look at the help page. From ?GOALL:

 GO: GO Identifiers associated with a gene of interest

 GOALL: GO Identifiers (includes less specific terms)
ADD COMMENTlink written 17 months ago by James W. MacDonald49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 223 users visited in the last hour