Question: Presence of GO annotations for human mitochondrial genes in
gravatar for Owen Dando
11 months ago by
Owen Dando20
Owen Dando20 wrote:


It's likely I'm missing something obvious or being dense here, but it appears that there are no GO annotations attached to human mitochondrial genes in v3.4.2? This is not the case for mouse mitochondrial genes in, or for rat mitochondrial genes in, nor does it appear to be true for the underlying "lite" Gene Ontology database ( on which I understand the GO mappings in are based. Is this correct?

The code below hopefully illustrates the issue. I'm using Bioconductor v3.6 (also tried with v3.5). 

Thanks in advance,

Owen Dando


# Return the number of genes on chromosome 'chr'
number_of_genes_on_chromosome <- function(chr, gene_id_to_chromosome) {
  gene_id_to_chromosome %>%
    toTable %>% 
    filter(chromosome == chr) %>% 

# Return a table counting the number of GO terms for each gene
gene_id_to_number_of_terms <- function(gene_id_to_go_term) {
  gene_id_to_number_of_terms <- gene_id_to_go_term %>% 
    toTable() %>% 
    distinct(gene_id, go_id) %>% 
    group_by(gene_id) %>% 

# Return the number of genes on chromosome 'chr' annotated with at least one GO term
number_of_genes_on_chromosome_with_annotation <- function(
  chr, gene_id_to_go_term, gene_id_to_chromosome) {

  gene_id_to_chromosome %>% 
    toTable %>% 
    left_join(gene_id_to_go_term %>% gene_id_to_number_of_terms()) %>% 
    filter(chromosome == chr & ! %>% 

# Return the percentage of genes on chromosome 'chr' annotated with at least one GO term
percentage_of_genes_with_annotations <- function(chr, gene_id_to_go_term, gene_id_to_chromosome) {
  100 * 
    number_of_genes_on_chromosome_with_annotation(chr, gene_id_to_go_term, gene_id_to_chromosome) / 
    number_of_genes_on_chromosome(chr, gene_id_to_chromosome)

# Then for human mitochondrial genes there are none with GO annotations...

percentage_of_genes_with_annotations("MT", org.Hs.egGO2ALLEGS, org.Hs.egCHR)

> 0

# But for mouse and rat mitochondrial genes, at least some have annotations...
percentage_of_genes_with_annotations("MT", org.Mm.egGO2ALLEGS, org.Mm.egCHR)

> 100

percentage_of_genes_with_annotations("MT", org.Rn.egGO2ALLEGS, org.Rn.egCHR)

> 35.13514

# Print session info

> R version 3.4.2 (2017-09-28)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.3 LTS

> Matrix products: default
> BLAS: /usr/lib/libblas/
> LAPACK: /usr/lib/lapack/

> locale:
> [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
> [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              

> attached base packages:
> [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

> other attached packages:
> [1] bindrcpp_0.2         magrittr_1.5  
> [5]   AnnotationDbi_1.40.0 IRanges_2.12.0       S4Vectors_0.16.0    
> [9] Biobase_2.38.0       BiocGenerics_0.24.0  dplyr_0.7.4         

> loaded via a namespace (and not attached):
> [1] Rcpp_0.12.13     bindr_0.1        bit_1.1-12       R6_2.2.2         rlang_0.1.2     
> [6] blob_1.1.0       tools_3.4.2      DBI_0.7          bit64_0.9-7      assertthat_0.2.0
> [11] digest_0.6.12    tibble_1.3.4     memoise_1.1.0    glue_1.2.0       RSQLite_2.0     
> [16] compiler_3.4.2   pkgconfig_2.0.1




ADD COMMENTlink modified 11 months ago by James W. MacDonald48k • written 11 months ago by Owen Dando20
gravatar for James W. MacDonald
11 months ago by
United States
James W. MacDonald48k wrote:

The package is simply a repackaging of what we can get from NCBI; in this case the gene2go.gz file we get from their FTP site. We use the mappings in that file to map Entrez Gene IDs to GO terms. We can look in the gene2go file to see what they provide:

> library(
> z <- unlist(as.list(org.Hs.egCHR))
> egids <- names(z)[z %in% "MT"]
> library(
> zz <- unlist(as.list(org.Mm.egCHR))
> megids <- names(zz)[zz %in% "MT"]
> length(system(paste("awk '{if($2 ~", paste0("/", paste0("^", egids, "$", collapse = "|"), "/"), ") print $0}' gene2go"), intern = TRUE))
[1] 0
> length(system(paste("awk '{if($2 ~", paste0("/", paste0("^", megids, "$", collapse = "|"), "/"), ") print $0}' gene2go"), intern = TRUE))
[1] 271

So NCBI isn't providing us with any mappings of Entrez Gene IDs to GO terms for human mitochondrial genes, but they are for mouse (and presumably rat).


ADD COMMENTlink written 11 months ago by James W. MacDonald48k

Hi James - many thanks for the quick response and explanation. I am attempting to follow up with NCBI as to why these mappings aren't present in gene2go.gz.

ADD REPLYlink written 11 months ago by Owen Dando20

Just in case anyone else hits this issue: after following up with NCBI, this was indeed confirmed as a bug in the production of the underlying gene2go.gz data. Apparently this has now been resolved, so the corrected data will presumably percolate up into the next version of

ADD REPLYlink written 10 months ago by Owen Dando20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 240 users visited in the last hour