I am using limma as my standard work horse to analyze microarray data. I noticed that since the last release limma also is able to perform GO and KEGG overrepresentation analyses (functions kegga
and goanna
). However, input for these functions is currently limited to a few species; from the help page: species:
species identifier. Possible values are "Hs"
, "Mm"
, "Rn"
or "Dm"
.
Since I regularly analyze data from species other than these four, I wondered whether it would be possible to extend these list of species, ideally allowing any species for which an org.Xx.eg.db and/or KEGG annotations are available. FYI: when using the new AnnotationHub infrastructure apparently more than 1000 OrgDbs are available.
Alternatively, it would be very handy if an additional argument could be provided to goana
to specify the 2-letter species identifier, and to kegga
to "manually map" the 2-letter species ID to the 3-letter KEGG species identifier. But maybe there are better /easier solutions...
Background: I am currently working with an Affymetrix Chinese Hamster (Cricetulus griseus) dataset; its corresponding OrgDb is available using AnnotationHub, and pathway info for this species is also available at KEGG (thus: "Cg" = "cge").
Thanks,
Guido
> library(AnnotationHub) > hub = AnnotationHub() > query(hub, c("OrgDb")) AnnotationHub with 1019 records # snapshotDate(): 2015-12-29 # $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ # $species: Escherichia coli, Acanthamoeba castellanii_str._Neff, Acanthisit... # $rdataclass: OrgDb # additional mcols(): taxonomyid, genome, description, tags, sourceurl, # sourcetype # retrieve records with, e.g., 'object[["AH48006"]]' title AH48006 | org.Camelina_sativa.eg.sqlite AH48007 | org.Glycine_max.eg.sqlite AH48008 | org.Malus_domestica.eg.sqlite AH48009 | org.Zea_mays.eg.sqlite AH48010 | org.Brassica_rapa.eg.sqlite ... ... AH49587 | org.Ce.eg.db.sqlite AH49588 | org.Xl.eg.db.sqlite AH49589 | org.Sc.sgd.db.sqlite AH49590 | org.Dr.eg.db.sqlite AH49591 | org.Pf.plasmo.db.sqlite > > # for Chinese hamster ># AH48061 | org.Cricetulus_griseus.eg.sqlite > org.Cg.eg.db <- hub[["AH48061"]] > org.Cg.eg.db OrgDb object: | DBSCHEMAVERSION: 2.1 | DBSCHEMA: NOSCHEMA_DB | ORGANISM: Cricetulus griseus | SPECIES: Cricetulus griseus | CENTRALID: GID | Taxonomy ID: 10029 | Db type: OrgDb | Supporting package: AnnotationDbi Please see: help('select') for usage information >
FWIW the AnnotationHub OrgDb objects obey the usual interface, e.g.,
and a little more obscurely
-- these are OrgDb packages without the package. It's true that the annotations can be quite sketchy for many of these less common organisms.
Thanks Gordon; I understand your reasoning. I asked because I thought the AnnotationHub OrgDb objects had the same structure as the packages directly made available at Bioconductor.
I've now modified kegga() so you can use the GEO info from AnnotationHub, see above.
I would like to add that the code above for the GO analyses only works if the data frame
gene.go
only contains genes that are all annotated with a GO term; if that is not the case (i.e. some gene don't have a GO annotation) the functionkegga()
stops with a rather cryptic error (see below).This can be solved by removing all genes from
gene.go
that don't have a GO annotation. Maybe useful to add this check to the functionkegga
as well? [since even some human/mouse/rat genes don't have a GO annotation].Please note that in my hands KEGG-based analyses always work fine!
Hi Guido. Thanks for the bug report. I have fixed this in limma 3.26.7.
Hello,
I am trying to use C. elegans (Ce) org.Ce.eg.db for GO enrichment analysis. Even though it's an official package, and documentation states that it should work with goana - it doesn't.
Citing https://www.rdocumentation.org/packages/limma/versions/3.28.14/topics/goana?:
"goana
uses annotation from the appropriate Bioconductor organism package. Thespecies
can be any character string XX for which an organism package org.XX.eg.db exists and is installed. Seealias2Symbol
for other possible values forspecies
."Running the function with 'Ce' results in:
When I was debugging the function I have found this:
The available species list is hardwired, regardless of the installed .db packages. Is there a specific reason for that or is it a bug?
Best,
Povilas Norvaisas
You are obviously using a version of limma prior to version 3.28.12. Clearly you can't expect to use the latest features of the package if you don't install the current version.
Try typing sessionInfo().
In general, you need to read the documentation that comes with the version of limma that you are using (by typing ?goana), rather than reading documentation for a random version from the web. The documentation page you link to is neither the latest nor for the version you are using.
PS. Just as a hint for the future, it would have been better to have posted this as a new question rather than as comment on an old thread. There is nothing in the old thread to indicate that goana() should work with Ce, quite the opposite, the thread explained why goana() didn't work for Ce at the time.