Question: biomaRt question for GO term search
0
gravatar for Taiki Tsutsui
3 months ago by
Taiki Tsutsui0 wrote:

Hello everyone,

I have a little bit confused in using biomaRt. I would like to retrieve GO term definition and name for each GO ID. I have done following

mart <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl", host = "asia.ensembl.org") 
go_list <- getBM(attributes=c("go_id", "name_1006", "namespace_1003"),
              filters = "go",
              values = c("GO:0045576"),
              mart=mart)

I had suspected that this would give me a small list for GO:0045576, but I had gotten following

head(go_list, 10)
        go_id                                               name_1006     namespace_1003
1  GO:0007165                                     signal transduction biological_process
2  GO:0043547                  positive regulation of GTPase activity biological_process
3  GO:0005096                               GTPase activator activity molecular_function
4  GO:0005737                                               cytoplasm cellular_component
5  GO:0005829                                                 cytosol cellular_component
6  GO:0051056 regulation of small GTPase mediated signal transduction biological_process
7  GO:0032956           regulation of actin cytoskeleton organization biological_process
8  GO:0030833             regulation of actin filament polymerization biological_process
9  GO:0051497            negative regulation of stress fiber assembly biological_process
10 GO:1904425                      negative regulation of GTP binding biological_process

dim(go_list)
[1] 848   3

Do you guys know why this much of different GO id data returned for only one GO id query. Any comment will be a big help.

Thanks, Taiki

biomart go term • 116 views
ADD COMMENTlink modified 3 months ago by Mike Smith4.0k • written 3 months ago by Taiki Tsutsui0
Answer: biomaRt question for GO term search
1
gravatar for Mike Smith
3 months ago by
Mike Smith4.0k
EMBL Heidelberg / de.NBI
Mike Smith4.0k wrote:

The issue here is that Ensembl BioMart is not really the right resource to be getting this information from. It's structured to return results about transcripts, rather than all information in Ensembl. So what your query is actually doing is finding all transcripts annotated with "GO:0045576" and then returning all the GO IDs, names, and descriptions associated with that collection of transcripts. However this isn't obvious as you aren't returning a column with transcript or gene IDs.

Consider these two queries, which are identical except that one include the transcript IDs as well:

go_list <- getBM(attributes=c("go_id", "name_1006", "namespace_1003"),
                 filters = "go",
                 values = c("GO:0045576"),
                 mart=mart, 
                 uniqueRows = FALSE)

go_list2 <- getBM(attributes=c("go_id", "name_1006", "namespace_1003",
                              "ensembl_transcript_id"),
                 filters = "go",
                 values = c("GO:0045576"),
                 mart=mart, 
                 uniqueRows = FALSE)

identical( nrow(go_list), nrow(go_list2) )
[1] TRUE

Also note that I've used the argument uniqueRows = FALSE to more clearly demonstrate that the results are transcript-centric. The default option here will remove duplicate rows in the result, which will remove more rows from the first result as it doesn't include the transcript IDs.


There are probably many ways to actually get the information you're looking for, but one is to use the Ensembl REST API. Here's a small function that I think does what you're looking for.

library(httr)
library(jsonlite)
library(xml2)
library(tibble)

getGoDetails <- function(go_id) {
    server <- "https://rest.ensembl.org/ontology/id/"

    r <- GET(paste(server, go_id, sep = ""), 
             content_type("application/json"))

    stop_for_status(r)

    res <- fromJSON(toJSON(content(r)))
    tibble(id = res$accession, 
               name = res$name, 
               definition = res$definition)
}

getGoDetails("GO:0045576")
# A tibble: 1 x 3
  id        name             definition                                                           
  <chr>     <chr>            <chr>                                                  
1 GO:00455… mast cell activ… The change in morphology and behavior of a mast cell resulting from…

If you're looking to do this for a large number of terms then it would be much more efficient to find a file or database that contains all the information and work with that offline.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Mike Smith4.0k
1

As an example of what Mike is implying here:

> library(GO.db)
> head(toTable(GOTERM))
       go_id      go_id                             Term Ontology
1 GO:0000001 GO:0000001        mitochondrion inheritance       BP
2 GO:0000002 GO:0000002 mitochondrial genome maintenance       BP
3 GO:0000003 GO:0000003                     reproduction       BP
4 GO:0000003 GO:0000003                     reproduction       BP
5 GO:0000003 GO:0000003                     reproduction       BP
6 GO:0042254 GO:0042254              ribosome biogenesis       BP
                                                                                                                                                                                                     Definition
1                       The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton.
2                                                             The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome.
3                                                                                  The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms.
4                                                                                  The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms.
5                                                                                  The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms.
6 A cellular process that results in the biosynthesis of constituent macromolecules, assembly, and arrangement of constituent parts of ribosome subunits; includes transport to the sites of protein synthesis.
                             Synonym  Secondary
1          mitochondrial inheritance       <NA>
2                               <NA>       <NA>
3 reproductive physiological process       <NA>
4                         GO:0019952 GO:0019952
5                         GO:0050876 GO:0050876
6   ribosome biogenesis and assembly       <NA>
ADD REPLYlink written 3 months ago by James W. MacDonald51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 312 users visited in the last hour