How to retrieve gene ontology GO class from gene list?
1
0
Entering edit mode
@7c443214
Last seen 2 days ago
Canada

Hi all,

Does any one know how to obtain the information under GO class (direct) given a gene list? Ideally I would like to be able to retrieve it with R / API. I have tested several packages but none are able to retrieve this information... Any help would be greatly appreciated!

enter image description here

edit: this screenshot is taken from AmiGO Annotations edit2: title updated for clarity

GO.db goTools • 220 views
ADD COMMENT
0
Entering edit mode

What do you mean by 'the information'? Do you have things in particular that you want?

0
Entering edit mode

Sorry I wasn't clear before. What I mean is the all the information associated with the go term, such as description and evidence.

I have since realized that most of this can be done via BiomaRt (under some confusingly named attributes). Included below in case it may help others. e.g.

ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
gene.data  <- getBM(attributes = c('hgnc_symbol','go_id','name_1006','definition_1006','go_linkage_type','namespace_1003'),
                    filters = 'hgnc_symbol',
                    values = "TOX3", 
                    mart = ensembl,
                    uniqueRows = TRUE)
head(gene.data,n=2)

  hgnc_symbol      go_id   name_1006                                                                                                                        definition_1006 go_linkage_type     namespace_1003
2        TOX3 GO:0005654 nucleoplasm                                                          That part of the nuclear content other than the chromosomes or the nucleolus.             IDA cellular_component
3        TOX3 GO:0005829     cytosol The part of the cytoplasm that does not contain organelles but which does contain other particulate matter, such as protein complexes.             IDA cellular_component
ADD REPLY
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States

You can also use a combination of the org.Hs.eg.db and GO.db packages.

> library(GO.db)

> library(org.Hs.eg.db)

> go <- mapIds(org.Hs.eg.db, "TOX3", "GO","SYMBOL")
'select()' returned 1:many
mapping between keys and columns

> go
        TOX3 
"GO:0003682" 

> info <- select(GO.db, go, c("TERM","DEFINITION"), "GOID")
'select()' returned 1:1 mapping
between keys and columns
> info
        GOID              TERM
1 GO:0003682 chromatin binding
                                                                                                                                                 DEFINITION
1 Binding to chromatin, the network of fibers of DNA, protein, and sometimes RNA, that make up the chromosomes of the eukaryotic nucleus during interphase.
0
Entering edit mode

Thanks, this is potentially a much faster way than parsing BiomRt. However, it is only returning one GO term right now (biomaRt returns a few more), is there anyway to make it show all the associated ones? Secondly, is it also possible to retrieve the genomic coordinates from org.Hs.eg.db?

ADD REPLY
0
Entering edit mode

Yes. The keys argument to both select and mapIds will accept a vector (see ?select for more information). Also, the vignette for AnnotationDbi has lots of examples that you can emulate. The best way to improve with R is to learn how to figure things out, and reading both the help pages and vignettes is a good start.

You can no longer get positional information (except for the karyotype info) from the OrgDb packages. Instead you should use a TxDb package, which is a different subject entirely. You can read the relevant vignette for that sort of thing as well.

Login before adding your answer.

Traffic: 842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6