biomaRt question for GO term search
1
1
Entering edit mode
@taiki-tsutsui-14286
Last seen 6.3 years ago

Hello everyone,

I have a little bit confused in using biomaRt. I would like to retrieve GO term definition and name for each GO ID. I have done following

mart <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl", host = "asia.ensembl.org") 
go_list <- getBM(attributes=c("go_id", "name_1006", "namespace_1003"),
              filters = "go",
              values = c("GO:0045576"),
              mart=mart)

I had suspected that this would give me a small list for GO:0045576, but I had gotten following

head(go_list, 10)
        go_id                                               name_1006     namespace_1003
1  GO:0007165                                     signal transduction biological_process
2  GO:0043547                  positive regulation of GTPase activity biological_process
3  GO:0005096                               GTPase activator activity molecular_function
4  GO:0005737                                               cytoplasm cellular_component
5  GO:0005829                                                 cytosol cellular_component
6  GO:0051056 regulation of small GTPase mediated signal transduction biological_process
7  GO:0032956           regulation of actin cytoskeleton organization biological_process
8  GO:0030833             regulation of actin filament polymerization biological_process
9  GO:0051497            negative regulation of stress fiber assembly biological_process
10 GO:1904425                      negative regulation of GTP binding biological_process

dim(go_list)
[1] 848   3

Do you guys know why this much of different GO id data returned for only one GO id query. Any comment will be a big help.

Thanks, Taiki

biomaRt GO term • 8.3k views
ADD COMMENT
3
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 4 weeks ago
EMBL Heidelberg

The issue here is that Ensembl BioMart is not really the right resource to be getting this information from. It's structured to return results about transcripts, rather than all information in Ensembl. So what your query is actually doing is finding all transcripts annotated with "GO:0045576" and then returning all the GO IDs, names, and descriptions associated with that collection of transcripts. However this isn't obvious as you aren't returning a column with transcript or gene IDs.

Consider these two queries, which are identical except that one include the transcript IDs as well:

go_list <- getBM(attributes=c("go_id", "name_1006", "namespace_1003"),
                 filters = "go",
                 values = c("GO:0045576"),
                 mart=mart, 
                 uniqueRows = FALSE)

go_list2 <- getBM(attributes=c("go_id", "name_1006", "namespace_1003",
                              "ensembl_transcript_id"),
                 filters = "go",
                 values = c("GO:0045576"),
                 mart=mart, 
                 uniqueRows = FALSE)

identical( nrow(go_list), nrow(go_list2) )
[1] TRUE

Also note that I've used the argument uniqueRows = FALSE to more clearly demonstrate that the results are transcript-centric. The default option here will remove duplicate rows in the result, which will remove more rows from the first result as it doesn't include the transcript IDs.


There are probably many ways to actually get the information you're looking for, but one is to use the Ensembl REST API. Here's a small function that I think does what you're looking for.

library(httr)
library(jsonlite)
library(xml2)
library(tibble)

getGoDetails <- function(go_id) {
    server <- "https://rest.ensembl.org/ontology/id/"

    r <- GET(paste(server, go_id, sep = ""), 
             content_type("application/json"))

    stop_for_status(r)

    res <- fromJSON(toJSON(content(r)))
    tibble(id = res$accession, 
               name = res$name, 
               definition = res$definition)
}

getGoDetails("GO:0045576")
# A tibble: 1 x 3
  id        name             definition                                                           
  <chr>     <chr>            <chr>                                                  
1 GO:00455… mast cell activ… The change in morphology and behavior of a mast cell resulting from…

If you're looking to do this for a large number of terms then it would be much more efficient to find a file or database that contains all the information and work with that offline.

ADD COMMENT
3
Entering edit mode

As an example of what Mike is implying here:

> library(GO.db)
> head(toTable(GOTERM))
       go_id      go_id                             Term Ontology
1 GO:0000001 GO:0000001        mitochondrion inheritance       BP
2 GO:0000002 GO:0000002 mitochondrial genome maintenance       BP
3 GO:0000003 GO:0000003                     reproduction       BP
4 GO:0000003 GO:0000003                     reproduction       BP
5 GO:0000003 GO:0000003                     reproduction       BP
6 GO:0042254 GO:0042254              ribosome biogenesis       BP
                                                                                                                                                                                                     Definition
1                       The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton.
2                                                             The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome.
3                                                                                  The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms.
4                                                                                  The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms.
5                                                                                  The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms.
6 A cellular process that results in the biosynthesis of constituent macromolecules, assembly, and arrangement of constituent parts of ribosome subunits; includes transport to the sites of protein synthesis.
                             Synonym  Secondary
1          mitochondrial inheritance       <NA>
2                               <NA>       <NA>
3 reproductive physiological process       <NA>
4                         GO:0019952 GO:0019952
5                         GO:0050876 GO:0050876
6   ribosome biogenesis and assembly       <NA>
ADD REPLY
0
Entering edit mode

How to convert the ensemble_gene_id or hgnc_id to go_id? As I am trying through biomart, but it's not giving me the output. Any suggestions would be highly appreciated.

Thanks in advance.

ADD REPLY
0
Entering edit mode

I think it would be best to start a new question with an example of what you've already tried, and an explanation of how whatever you've getting differs from what you ultimately want. I think you're more likely to get some useful help doing this.

ADD REPLY

Login before adding your answer.

Traffic: 920 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6