Replicating GO website analysis with bioconductor packages (topGO?)
Entering edit mode
Brian Smith ▴ 120
Last seen 23 months ago
United States


I wanted to replicate the GO analysis done thru' (, but using R/Bioconductor. For the 'original' GO analysis, I just copy and pasted my genes on the website, used all defaults (i.e. 'biological process', 'Homo Sapiens', panther.db), and click on 'Launch'. This resulted in a set of significant GO terms.

I am now trying to replicate this anaylsis with R/Bioconductor with topGO package, but I get an error. My code is:

pthOrganisms(PANTHER.db) <- "HUMAN"

allpanther <- keys(PANTHER.db,keytype="ENTREZ")

## myentrezGenes - my genes of interest
idx <- allpanther %in% myentrezGenes
genesidx <- factor(as.integer(aidx))
names(genesidx) <- allpanther

tgd <- new( "topGOdata", ontology="BP", allGenes = genesidx, nodeSize=5,
   , mapping="PANTHER.db")

> Building most specific GOs .....
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'conn' in selecting a method for function 'dbGetQuery': object 'PANTHER_dbconn' not found

My questions:

  1. Is this the best method to try to replicate results I get from the GO website ( Or is there an API that I can use to programmatically get the results? Or some other bioconductor package?

  2. How can I rectify my code?

Thanks for your help!!

My sessioninfo is:

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] PANTHER.db_1.0.10    RSQLite_2.2.0        AnnotationHub_2.20.2 BiocFileCache_1.12.1
 [5] dbplyr_1.4.4  topGO_2.40.0         SparseM_1.78        
 [9] GO.db_3.11.4         AnnotationDbi_1.50.3 IRanges_2.22.2       S4Vectors_0.26.1    
[13] Biobase_2.48.0       graph_1.66.0         BiocGenerics_0.34.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5                    later_1.1.0.1                
 [3] BiocManager_1.30.10           compiler_4.0.2               
 [5] pillar_1.4.6                  tools_4.0.2                  
 [7] digest_0.6.25                 bit_4.0.4                    
 [9] memoise_1.1.0                 tibble_3.0.3                 
[11] lifecycle_0.2.0               lattice_0.20-41              
[13] pkgconfig_2.0.3               rlang_0.4.7                  
[15] shiny_1.5.0                   DBI_1.1.0                    
[17] rstudioapi_0.11               yaml_2.2.1                   
[19] curl_4.3                      fastmap_1.0.1                
[21] httr_1.4.2                    dplyr_1.0.2                  
[23] rappdirs_0.3.1                generics_0.0.2               
[25] vctrs_0.3.4                   bit64_4.0.5                  
[27] grid_4.0.2                    tidyselect_1.1.0             
[29] glue_1.4.2                    R6_2.4.1                     
[31] purrr_0.3.4                   blob_1.2.1                   
[33] magrittr_1.5                  promises_1.1.1               
[35] htmltools_0.5.0               matrixStats_0.56.0           
[37] ellipsis_0.3.1                assertthat_0.2.1             
[39] xtable_1.8-4                  mime_0.9                     
[41] interactiveDisplayBase_1.26.3 httpuv_1.5.4                 
[43] BiocVersion_3.11.1            crayon_1.3.4
topGO Gene Ontology enrichment panther.db • 648 views
Entering edit mode
Last seen 11 minutes ago
United States

It's not clear what extra you might gain by using the PANTHER database, particularly for a GO analysis. And what the geneontology folks have on their website doesn't really say either. Anyway, the 'annot' argument is meant to be a function that maps the IDs to the GO terms, and in particular is intended to be used with orgDb packages, like, which panther.db is definitely not.

So you have two or three things you could hypothetically do.

  1. Forgo using the panther.db package and instead use Probably the simplest thing to do.
  2. Generate a gene2GO mapping using panther.db, and then use that (see sections 4.2 and 4.3 in the topGO vignette.)
  3. Get all l33t and create a function that you can use as the 'annot' argument that will do the right thing with the panther.db package. That might be fun and instructive, but either of the previous suggestions will probably be easier and generate similar results.

For #1, I am assuming that your myentrezGenes object is actually a vector of the NCBI Gene IDs for those genes that were significant in your analysis. If not you need to use either the p-values for all of the genes you tested (so it would be a named vector, where the values are p-values, and the names are the NCBI Gene IDs, in which case you need to also specify the 'geneSel' argument), or a boolean vector with NCBI Gene IDs as the names, where TRUE means it was significant and FALSE means it wasn't.

Entering edit mode

if you provide a function to geneSel it will not be used (see this other questions:,

Entering edit mode

The geneSel argument isn't applicable if you are doing a KS test, because that uses the scores directly instead of generating a contingency table. Try your example using the fisher test.

Entering edit mode
Last seen 56 minutes ago
WEHI, Melbourne, Australia

If your genes are human Entrez Gene IDs, then the following code does a GO analysis. To make a reproducible example, I have chosen five genes in the Bcl2 apoptosis pathway. The code does overlap enrichment Fisher exact tests similar to the geneontology website:

> MyEntrezGenes <- c("596","637","5366","10018","27113")
> library(limma)
> g <- goana(MyEntrezGenes)
> topGO(g, truncate=50)
                                                         Term Ont   N DE     P.DE
GO:0001844 protein insertion into mitochondrial membrane i...  BP  30  5 4.74e-15
GO:0051204      protein insertion into mitochondrial membrane  BP  45  5 4.06e-14
GO:0090151 establishment of protein localization to mitoch...  BP  47  5 5.10e-14
GO:0097345      mitochondrial outer membrane permeabilization  BP  54  5 1.05e-13
GO:0001836          release of cytochrome c from mitochondria  BP  59  5 1.67e-13
GO:1902110 positive regulation of mitochondrial membrane p...  BP  60  5 1.82e-13
GO:2001244 positive regulation of intrinsic apoptotic sign...  BP  61  5 1.98e-13
GO:1902686 mitochondrial outer membrane permeabilization i...  BP  62  5 2.15e-13
GO:0051205                    protein insertion into membrane  BP  62  5 2.15e-13
GO:0035794 positive regulation of mitochondrial membrane p...  BP  64  5 2.54e-13
GO:1905710       positive regulation of membrane permeability  BP  66  5 2.97e-13
GO:1902108 regulation of mitochondrial membrane permeabili...  BP  66  5 2.97e-13
GO:0046902 regulation of mitochondrial membrane permeabili...  BP  76  5 6.14e-13
GO:0090559                regulation of membrane permeability  BP  86  5 1.16e-12
GO:0010822 positive regulation of mitochondrion organizati...  BP 116  5 5.33e-12
GO:0008637                    apoptotic mitochondrial changes  BP 124  5 7.49e-12
GO:1900740 positive regulation of protein insertion into m...  BP  26  4 1.02e-11
GO:1900739 regulation of protein insertion into mitochondr...  BP  26  4 1.02e-11
GO:0007006                mitochondrial membrane organization  BP 134  5 1.11e-11
GO:0072655 establishment of protein localization to mitoch...  BP 137  5 1.24e-11

Login before adding your answer.

Traffic: 466 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6