org.Mm.eg.db gives wrong symbol for MT genes
1
2
Entering edit mode
@gordon-smyth
Last seen 8 hours ago
WEHI, Melbourne, Australia

Dear Biocore,

We make a strong effort to use current NCBI official gene symbols and names in all our work, and we make much use of the excellent Bioconductor packages org.Mm.eg.db and org.Hs.eg.db for this purpose.

I have recently noticed that org.Mm.eg.db is giving incorrect official names for mitochondrial genes.  It is giving human symbols for these genes instead of mouse symbols.  For example

> mappedRkeys(org.Mm.egSYMBOL["17710"])
[1] "COX3"

According to both Entrez Gene

   http://www.ncbi.nlm.nih.gov/gene/?term=17710

and MGI

   http://www.informatics.jax.org/marker/MGI:102502

the official symbol is "mt-Co3".  This has been the official symbol for at least 4 years and probably longer.

The correct name is not even included as an Alias:

> mappedRkeys(revmap(org.Mm.egALIAS2EG)["17710"])
[1] "COX3"

COX3 is the actually the symbol for the human ortholog. It should only be an alias for the mouse gene.

Same for all the mitochondrial genes.  In all cases, org.Mm.egSYMBOL is giving the human symbol instead of the mouse symbol.

Is this deliberate? If not, can you please fix?

Thanks a lot
Gordon

> sessionInfo()

R version 3.0.1 Patched (2013-07-04 r63183)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_Australia.1252
[2] LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets
[7] methods   base

other attached packages:
[1] org.Mm.eg.db_2.9.0   org.Hs.eg.db_2.9.0   RSQLite_0.11.4
[4] DBI_0.2-7            AnnotationDbi_1.22.6 Biobase_2.20.0
[7] BiocGenerics_0.6.0   limma_3.17.20

loaded via a namespace (and not attached):
[1] IRanges_1.18.2 stats4_3.0.1
org.mm.eg.db • 2.9k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 4 days ago
United States
Gordon, more definitive answers will likely come from the annotation core members, but here is what I understand about this. The mappings are completely dependent on NCBI content. Working with ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.g ene_info.gz the header is #Format: tax_id GeneID Symbol LocusTag Synonyms dbXrefs chromosome map_location description type_of_gene Symbol_from_nomenclature_authority Full_name_from_nomenclature_authority Nomenclature_status Other_designations Modification_date (tab is used as a separator, pound sign - start of a comment) and, with some context, the record for 17710 is > x[c(1,3516),] tax_id GeneID Symbol LocusTag Synonyms 1 10090 11287 Pzp - A1m|A2m|AI893533|MAM 3516 10090 17710 COX3 - - dbXrefs chromosome 1 MGI:87854|Ensembl:ENSMUSG00000030359|Vega:OTTMUSG00000022212 6 3516 MGI:102502 MT map_location description type_of_gene 1 6 F1-G3|6 63.02 cM pregnancy zone protein protein- coding 3516 - cytochrome c oxidase subunit III protein- coding Symbol_from_nomenclature_authority Full_name_from_nomenclature_authority 1 Pzp pregnancy zone protein 3516 mt-Co3 cytochrome c oxidase III, mitochondrial Nomenclature_status Other_designations 1 O alpha 1 macroglobulin|alpha-2-M|alpha-2-macroglobulin 3516 O - Modification_date X 1 20130804 NA 3516 20130804 NA I would conjecture that the solution needs to come from NCBI -- they may have neglected to deal properly with the MT genes in this case, as the following computation suggests. The symbols for which field "Symbol" does not agree with field "Symbol_from_nomenclature_authority" are > xsn[xs!=xsn] [1] "mt-Atp6" "mt-Atp8" "mt-Co1" "mt-Co2" "mt-Co3" "mt-Cytb" "mt-Nd1" [8] "mt-Nd2" "mt-Nd3" "mt-Nd4" "mt-Nd4l" "mt-Nd5" "mt-Nd6" "mt-Rnr1" [15] "mt-Rnr2" "mt-Ta" "mt-Tc" "mt-Td" "mt-Te" "mt-Tf" "mt-Tg" [22] "mt-Th" "mt-Ti" "mt-Tk" "mt-Tl1" "mt-Tl2" "mt-Tm" "mt-Tn" [29] "mt-Tp" "mt-Tq" "mt-Tr" "mt-Ts1" "mt-Ts2" "mt-Tt" "mt-Tv" [36] "mt-Tw" "mt-Ty" On Fri, Aug 9, 2013 at 11:17 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Biocore, > > We make a strong effort to use current NCBI official gene symbols and > names in all our work, and we make much use of the excellent Bioconductor > packages org.Mm.eg.db and org.Hs.eg.db for this purpose. > > I have recently noticed that org.Mm.eg.db is giving incorrect official > names for mitochondrial genes. It is giving human symbols for these genes > instead of mouse symbols. For example > > > mappedRkeys(org.Mm.egSYMBOL["**17710"]) > [1] "COX3" > > According to both Entrez Gene > > http://www.ncbi.nlm.nih.gov/**gene/?term=17710<http: www.ncbi.nlm="" .nih.gov="" gene="" ?term="17710"> > > and MGI > > http://www.informatics.jax.**org/marker/MGI:102502<http: www.info="" rmatics.jax.org="" marker="" mgi:102502=""> > > the official symbol is "mt-Co3". This has been the official symbol for at > least 4 years and probably longer. > > The correct name is not even included as an Alias: > > > mappedRkeys(revmaporg.Mm.**egALIAS2EG)["17710"]) > [1] "COX3" > > COX3 is the actually the symbol for the human ortholog. It should only be > an alias for the mouse gene. > > Same for all the mitochondrial genes. In all cases, org.Mm.egSYMBOL is > giving the human symbol instead of the mouse symbol. > > Is this deliberate? If not, can you please fix? > > Thanks a lot > Gordon > > ------------------------------**--------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > http://www.statsci.org/smyth > > > sessionInfo() >> > R version 3.0.1 Patched (2013-07-04 r63183) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_Australia.**1252 > [2] LC_CTYPE=English_Australia.**1252 > [3] LC_MONETARY=English_Australia.**1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_Australia.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > [7] methods base > > other attached packages: > [1] org.Mm.eg.db_2.9.0 org.Hs.eg.db_2.9.0 RSQLite_0.11.4 > [4] DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.0 > [7] BiocGenerics_0.6.0 limma_3.17.20 > > loaded via a namespace (and not attached): > [1] IRanges_1.18.2 stats4_3.0.1 > > > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:13}}
ADD COMMENT
0
Entering edit mode

Hi Vincent,

Thanks, that explains it. After reading your reply, I went to the NCBI Gene FAQ and found the following explanation:

"NOTE: To the greatest extent possible, each protein-coding gene in mitochondria has been assigned the same name (symbol) and full description across species. In some instances, this is at variance with the symbol assigned by species-specific nomenclature committees."

This would be fine except that (i) the NCBI Gene web interface disagrees with the NCBI gene_info file and (ii) the official gene symbols supplied by the MGI nomenclature committee has not been included as a synonym in the gene_info file.

Anyway, the bottom line for my lab is that we will treat the gene_info/org.Mm.eg.db symbols as official, and we will have to give the MT genes special treatment when mapping aliases.

Regards
Gordon

ADD REPLY
0
Entering edit mode

As usual, the two of you have sorted things out pretty precisely.

Vince is exactly right about what happened, and Gordon found out exactly why when he noticed that NCBI is deliberately renaming all mitochondrial symbols (when they can).

I can't say if I necessarily agree with NCBIs decisions here, but if I changed these annotations to better match our current expectations, then someone else would doubtless wonder why I was contaminating them from the source material...  So I am afraid that I probably have to leave them as they are.


   Marc

ADD REPLY

Login before adding your answer.

Traffic: 524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6