Search
Question: Presence of GO annotations for human mitochondrial genes in org.Hs.eg.db?
0
gravatar for Owen Dando
17 days ago by
Owen Dando0
Owen Dando0 wrote:

Hi, 

It's likely I'm missing something obvious or being dense here, but it appears that there are no GO annotations attached to human mitochondrial genes in org.Hs.eg.db v3.4.2? This is not the case for mouse mitochondrial genes in org.Mm.eg.db, or for rat mitochondrial genes in org.Rn.eg.db, nor does it appear to be true for the underlying "lite" Gene Ontology database (ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/) on which I understand the GO mappings in org.Hs.eg.db are based. Is this correct?

The code below hopefully illustrates the issue. I'm using Bioconductor v3.6 (also tried with v3.5). 

Thanks in advance,

Owen Dando


library(dplyr)
library(org.Hs.eg.db)
library(org.Mm.eg.db)
library(org.Rn.eg.db)
library(magrittr)

# Return the number of genes on chromosome 'chr'
number_of_genes_on_chromosome <- function(chr, gene_id_to_chromosome) {
  gene_id_to_chromosome %>%
    toTable %>% 
    filter(chromosome == chr) %>% 
    nrow
}

# Return a table counting the number of GO terms for each gene
gene_id_to_number_of_terms <- function(gene_id_to_go_term) {
  gene_id_to_number_of_terms <- gene_id_to_go_term %>% 
    toTable() %>% 
    distinct(gene_id, go_id) %>% 
    group_by(gene_id) %>% 
    summarise(count=n()) 
}

# Return the number of genes on chromosome 'chr' annotated with at least one GO term
number_of_genes_on_chromosome_with_annotation <- function(
  chr, gene_id_to_go_term, gene_id_to_chromosome) {

  gene_id_to_chromosome %>% 
    toTable %>% 
    left_join(gene_id_to_go_term %>% gene_id_to_number_of_terms()) %>% 
    filter(chromosome == chr & !is.na(count)) %>% 
    nrow  
}

# Return the percentage of genes on chromosome 'chr' annotated with at least one GO term
percentage_of_genes_with_annotations <- function(chr, gene_id_to_go_term, gene_id_to_chromosome) {
  100 * 
    number_of_genes_on_chromosome_with_annotation(chr, gene_id_to_go_term, gene_id_to_chromosome) / 
    number_of_genes_on_chromosome(chr, gene_id_to_chromosome)
}

# Then for human mitochondrial genes there are none with GO annotations...

percentage_of_genes_with_annotations("MT", org.Hs.egGO2ALLEGS, org.Hs.egCHR)

> 0

# But for mouse and rat mitochondrial genes, at least some have annotations...
percentage_of_genes_with_annotations("MT", org.Mm.egGO2ALLEGS, org.Mm.egCHR)

> 100

percentage_of_genes_with_annotations("MT", org.Rn.egGO2ALLEGS, org.Rn.egCHR)

> 35.13514

# Print session info
sessionInfo()

> R version 3.4.2 (2017-09-28)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.3 LTS

> Matrix products: default
> BLAS: /usr/lib/libblas/libblas.so.3.6.0
> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

> locale:
> [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
> [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
> [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
> [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

> attached base packages:
> [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

> other attached packages:
> [1] bindrcpp_0.2         magrittr_1.5         org.Rn.eg.db_3.4.2   org.Mm.eg.db_3.4.2  
> [5] org.Hs.eg.db_3.4.2   AnnotationDbi_1.40.0 IRanges_2.12.0       S4Vectors_0.16.0    
> [9] Biobase_2.38.0       BiocGenerics_0.24.0  dplyr_0.7.4         

> loaded via a namespace (and not attached):
> [1] Rcpp_0.12.13     bindr_0.1        bit_1.1-12       R6_2.2.2         rlang_0.1.2     
> [6] blob_1.1.0       tools_3.4.2      DBI_0.7          bit64_0.9-7      assertthat_0.2.0
> [11] digest_0.6.12    tibble_1.3.4     memoise_1.1.0    glue_1.2.0       RSQLite_2.0     
> [16] compiler_3.4.2   pkgconfig_2.0.1

 

 

 

ADD COMMENTlink modified 16 days ago by James W. MacDonald45k • written 17 days ago by Owen Dando0
0
gravatar for James W. MacDonald
16 days ago by
United States
James W. MacDonald45k wrote:

The org.Hs.eg.db package is simply a repackaging of what we can get from NCBI; in this case the gene2go.gz file we get from their FTP site. We use the mappings in that file to map Entrez Gene IDs to GO terms. We can look in the gene2go file to see what they provide:

> library(org.Hs.eg.db)
> z <- unlist(as.list(org.Hs.egCHR))
> egids <- names(z)[z %in% "MT"]
> library(org.Mm.eg.db)
> zz <- unlist(as.list(org.Mm.egCHR))
> megids <- names(zz)[zz %in% "MT"]
> length(system(paste("awk '{if($2 ~", paste0("/", paste0("^", egids, "$", collapse = "|"), "/"), ") print $0}' gene2go"), intern = TRUE))
[1] 0
> length(system(paste("awk '{if($2 ~", paste0("/", paste0("^", megids, "$", collapse = "|"), "/"), ") print $0}' gene2go"), intern = TRUE))
[1] 271

So NCBI isn't providing us with any mappings of Entrez Gene IDs to GO terms for human mitochondrial genes, but they are for mouse (and presumably rat).

 

ADD COMMENTlink written 16 days ago by James W. MacDonald45k

Hi James - many thanks for the quick response and explanation. I am attempting to follow up with NCBI as to why these mappings aren't present in gene2go.gz.

ADD REPLYlink written 16 days ago by Owen Dando0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 161 users visited in the last hour