Dear Biocore,
We make a strong effort to use current NCBI official gene symbols and names in all our work, and we make much use of the excellent Bioconductor packages org.Mm.eg.db and org.Hs.eg.db for this purpose.
I have recently noticed that org.Mm.eg.db is giving incorrect official names for mitochondrial genes. It is giving human symbols for these genes instead of mouse symbols. For example
> mappedRkeys(org.Mm.egSYMBOL["17710"]) [1] "COX3"
According to both Entrez Gene
http://www.ncbi.nlm.nih.gov/gene/?term=17710
and MGI
http://www.informatics.jax.org/marker/MGI:102502
the official symbol is "mt-Co3". This has been the official symbol for at least 4 years and probably longer.
The correct name is not even included as an Alias:
> mappedRkeys(revmap(org.Mm.egALIAS2EG)["17710"]) [1] "COX3"
COX3 is the actually the symbol for the human ortholog. It should only be an alias for the mouse gene.
Same for all the mitochondrial genes. In all cases, org.Mm.egSYMBOL is giving the human symbol instead of the mouse symbol.
Is this deliberate? If not, can you please fix?
Thanks a lot
Gordon
> sessionInfo()
R version 3.0.1 Patched (2013-07-04 r63183) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Australia.1252 [2] LC_CTYPE=English_Australia.1252 [3] LC_MONETARY=English_Australia.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_Australia.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets [7] methods base other attached packages: [1] org.Mm.eg.db_2.9.0 org.Hs.eg.db_2.9.0 RSQLite_0.11.4 [4] DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.0 [7] BiocGenerics_0.6.0 limma_3.17.20 loaded via a namespace (and not attached): [1] IRanges_1.18.2 stats4_3.0.1
Hi Vincent,
Thanks, that explains it. After reading your reply, I went to the NCBI Gene FAQ and found the following explanation:
"NOTE: To the greatest extent possible, each protein-coding gene in mitochondria has been assigned the same name (symbol) and full description across species. In some instances, this is at variance with the symbol assigned by species-specific nomenclature committees."
This would be fine except that (i) the NCBI Gene web interface disagrees with the NCBI gene_info file and (ii) the official gene symbols supplied by the MGI nomenclature committee has not been included as a synonym in the gene_info file.
Anyway, the bottom line for my lab is that we will treat the gene_info/org.Mm.eg.db symbols as official, and we will have to give the MT genes special treatment when mapping aliases.
Regards
Gordon
As usual, the two of you have sorted things out pretty precisely.
Vince is exactly right about what happened, and Gordon found out exactly why when he noticed that NCBI is deliberately renaming all mitochondrial symbols (when they can).
I can't say if I necessarily agree with NCBIs decisions here, but if I changed these annotations to better match our current expectations, then someone else would doubtless wonder why I was contaminating them from the source material... So I am afraid that I probably have to leave them as they are.
Marc