Question: go_bp table in custom OrgDb package
0
8 months ago by
psutton0
psutton0 wrote:

I am trying to use pcaExplorer's pca2go() function for functional enrichment analysis on genes with the highest principal component loadings. The function fails with the following error:

Ranking genes by the loadings ...
Extracting functional categories enriched in the gene subsets ...

Building most specific GOs .....
Error in result_create(conn@ptr, statement) : no such table: go_bp


The problem is due to the fact that I am using a custom OrgDb which doesn't have the go_bp table. A custom OrgDb for a non-model organism that I made using AnnotationForge::makeOrgPackageFromNCBI() has these tables:

tbls: accessions, alias, entrez_genes, gene_info, genes, go, go_all, map_counts, map_metadata, metadata, pubmed, refseq

In particular, it does not include go_bp, which explains the error in pca2go. In contrast, the standard org.Hs.eg.db includes these tables (among many others):

go, go_all, go_bp, go_bp_all, go_cc, go_cc_all, go_mf, go_mf_all

My questions:

1) What is the difference between go and go_bp tables? My understanding is that we are encouraged to use select() and columns() and keytypes(), but using a non-model organism and the error message from pca2go about a particular missing table is leading me down the rabbit hole (I am a bioconductor novice) of wanting to understand the tables.

2) And more practically, how can I create a custom OrgDb package which contains go_bp?

modified 8 months ago by James W. MacDonald51k • written 8 months ago by psutton0
Answer: go_bp table in custom OrgDb package
0
8 months ago by
United States
James W. MacDonald51k wrote:

The go_bp table is simply the subset of the go table, where the ontology is BP:

> library(org.Hs.eg.db)
> library(DBI)
> go_bp <- dbGetQuery(dbconn(org.Hs.eg.db), "select * from go_bp;")
_id      go_id evidence
1   1 GO:0002576      TAS
2   1 GO:0008150       ND
3   1 GO:0043312      TAS
4   2 GO:0001869      IDA
5   2 GO:0002576      TAS
6   2 GO:0007597      TAS

> go <- dbGetQuery(dbconn(org.Hs.eg.db), "select * from go;")
_id      go_id evidence ontology
1   1 GO:0002576      TAS       BP
2   1 GO:0003674       ND       MF
3   1 GO:0005576      HDA       CC
4   1 GO:0005576      IDA       CC
5   1 GO:0005576      TAS       CC
6   1 GO:0005615      HDA       CC
> sum(go_bp$go_id %in% go$go_id)
[1] 146284
> dim(go_bp)
[1] 146284      3
> dim(subset(go, go\$ontology == "BP"))
[1] 146284      4


It appears that pcaExplorer is querying the underlying database directly. It's not that difficult to see if the go_bp table exists, and if not, use the go table instead, so you might contact the maintainer of that package directly and ask them to fix their package.

Hi James and @psutton, here I am.

In the specific case of pca2go I am calling an underlying routine that is based on the good old topGO package. That package, in turn, expects to provide one of the three main ontologies to compute the enriched functions.

I might be wrong, but there is currently no workaround for that (please James do correct me if that is not the case, and if I can indeed do change something in my implementation for pca2go).

Alternative, but yet it might deliver more generic terms, would be to use limmaquickpca2go, which uses limma::goana - that might avoid the issue you have?

Federico

You could talk to Adrian Alexa about patching topGO to use the go table when the go_bp table doesn't exist.

Will do, thanks.

Federico

I was able to modify AnnotationForge so that it would create go_bp, go_cc, go_mf, go_bp_all, go_cc_all, and go_mf_all tables for me. However, it is a bit hacky, because there were a number of issues with the exact spelling and case of field names in the SQLite tables, which I partly describe here: https://support.bioconductor.org/p/118859/ .

Anyhow, I now have pca2go / topGO working for my custom OrgDb, but it is not a very general solution, so it would still be useful to other users to have a patch to use the go table when go_bp doesn't exist.

Hi Federico,

I tried limmaquick2pca2go, but it failed with an error message:

Error in goana.default(probesPC1pos_ENTREZ, bg_ENTREZ, species = organism) :
Can't find gene ontology mappings in package org.<species>.eg.db


and looking at the goana.default code where it fails, it is looking for an org.<species>.egGOALLEGS file:

    obj <- paste0("org.", species, ".egGO2ALLEGS")
egGO2ALLEGS <- tryCatch(getFromNamespace(obj, orgPkg), error = function(e) FALSE)
if (is.logical(egGO2ALLEGS))
stop("Can't find gene ontology mappings in package ",
orgPkg)


Human has org.Hs.egGO2ALLEGS, but the non-model organism OrgDb that I made using AnnotationForge::makeOrgPackageFromNCBI() does not. I don't know how to make such a mapping and how to include it in a custom OrgDb package.

Sorry for the confusion in pointing towards limmaquick2pca2go.

I checked now and noticed goana has built in support for a few species, but custom annotation packages can still be provided. The problem is then once you'd need the egGO2ALLEGS, and it can be that if it is built with that method, that specific table is not built.