Bug found in org.Hs.eg AnnotationDbi
1
0
Entering edit mode
@john-st-john-5628
Last seen 9.6 years ago
I have a gene for example, CDK4 http://www.ncbi.nlm.nih.gov/gene?term=1019, and I was testing its mapping between entrez ID and ENSEMBL id in R. It's entrez id is 1019. library(org.Hs.eg.db) revmap(org.Hs.egENSEMBL2EG)[['1019']] [1] "ENSG00000135446" #this appears to be the expected output (see http://www.ncbi.nlm.nih.gov/gene?term=1019) org.Hs.egENSEMBL[['11019']] [1] "ENSG00000121897" #wrong output, this is a different unrelated gene, LIAS org.Hs.egSYMBOL[['11019']] [1] "LIAS" #expected "CDK4" Help description for org.Hs.egENSEMBL: org.Hs.egENSEMBL is an R object that contains mappings between Entrez Gene identifiers and Ensembl gene accession numbers. Help description for org.Hs.egENSEMBL2EG (it is the same help message): org.Hs.egENSEMBL is an R object that contains mappings between Entrez Gene identifiers and Ensembl gene accession numbers. Help description for org.Hs.egSYMBOL: org.Hs.egSYMBOL is an R object that provides mappings between entrez gene identifiers and gene abbreviations. There is something wrong here, right? From the helps it looks like these different annotation databases do the mappings I expect, and should all work with entrez ids as input. The fact that one of these works, and the rest do not is concerning. Is there a more widely used, less buggy mapping between ENSEMBL and Entrez that I should be using instead of this one? Thanks, John [[alternative HTML version deleted]]
Annotation Annotation • 674 views
ADD COMMENT
0
Entering edit mode
@john-st-john-5628
Last seen 9.6 years ago
Foot-in-mouth, found the bug, my typo, I truncated a 1 in my test. http://www.ncbi.nlm.nih.gov/gene?term=11019 Looks like *maybe* the entrez ids are getting truncated in some of these databases or something, which could cause this problem. Looks like the output is the same On Nov 26, 2012, at 8:18 PM, John St. John <johnthesaintjohn@gmail.com> wrote: > I have a gene for example, CDK4 http://www.ncbi.nlm.nih.gov/gene?term=1019, and I was testing its mapping between entrez ID and ENSEMBL id in R. It's entrez id is 1019. > > library(org.Hs.eg.db) > > revmap(org.Hs.egENSEMBL2EG)[['1019']] > [1] "ENSG00000135446" > > #this appears to be the expected output (see http://www.ncbi.nlm.nih.gov/gene?term=1019) > > org.Hs.egENSEMBL[['11019']] > [1] "ENSG00000121897" > > #wrong output, this is a different unrelated gene, LIAS > > org.Hs.egSYMBOL[['11019']] > [1] "LIAS" > > #expected "CDK4" > > Help description for org.Hs.egENSEMBL: > org.Hs.egENSEMBL is an R object that contains mappings between Entrez Gene identifiers and Ensembl gene accession numbers. > > Help description for org.Hs.egENSEMBL2EG (it is the same help message): > org.Hs.egENSEMBL is an R object that contains mappings between Entrez Gene identifiers and Ensembl gene accession numbers. > > Help description for org.Hs.egSYMBOL: > org.Hs.egSYMBOL is an R object that provides mappings between entrez gene identifiers and gene abbreviations. > > There is something wrong here, right? From the helps it looks like these different annotation databases do the mappings I expect, and should all work with entrez ids as input. The fact that one of these works, and the rest do not is concerning. Is there a more widely used, less buggy mapping between ENSEMBL and Entrez that I should be using instead of this one? > > Thanks, > John > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi, I'm not sure, but did you just realize that you used two different entrez id's for your test, eg: revmap(org.Hs.egENSEMBL2EG)[['1019']] org.Hs.egENSEMBL[['11019']] Or do you think you are seeing a different problem? Not sure where you think "the truncating" is happening On Monday, November 26, 2012, John St. John wrote: > Foot-in-mouth, found the bug, my typo, I truncated a 1 in my test. > > http://www.ncbi.nlm.nih.gov/gene?term=11019 > > Looks like *maybe* the entrez ids are getting truncated in some of these > databases or something, which could cause this problem. > > > Looks like the output is the same > > On Nov 26, 2012, at 8:18 PM, John St. John <johnthesaintjohn@gmail.com<javascript:;>> > wrote: > > > I have a gene for example, CDK4 > http://www.ncbi.nlm.nih.gov/gene?term=1019, and I was testing its mapping > between entrez ID and ENSEMBL id in R. It's entrez id is 1019. > > > > library(org.Hs.eg.db) > > > > revmap(org.Hs.egENSEMBL2EG)[['1019']] > > [1] "ENSG00000135446" > > > > #this appears to be the expected output (see > http://www.ncbi.nlm.nih.gov/gene?term=1019) > > > > org.Hs.egENSEMBL[['11019']] > > [1] "ENSG00000121897" > > > > #wrong output, this is a different unrelated gene, LIAS > > > > org.Hs.egSYMBOL[['11019']] > > [1] "LIAS" > > > > #expected "CDK4" > > > > Help description for org.Hs.egENSEMBL: > > org.Hs.egENSEMBL is an R object that contains mappings between Entrez > Gene identifiers and Ensembl gene accession numbers. > > > > Help description for org.Hs.egENSEMBL2EG (it is the same help message): > > org.Hs.egENSEMBL is an R object that contains mappings between Entrez > Gene identifiers and Ensembl gene accession numbers. > > > > Help description for org.Hs.egSYMBOL: > > org.Hs.egSYMBOL is an R object that provides mappings between entrez > gene identifiers and gene abbreviations. > > > > There is something wrong here, right? From the helps it looks like these > different annotation databases do the mappings I expect, and should all > work with entrez ids as input. The fact that one of these works, and the > rest do not is concerning. Is there a more widely used, less buggy mapping > between ENSEMBL and Entrez that I should be using instead of this one? > > > > Thanks, > > John > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <javascript:;> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Nope, you are correct. I have been working too long today On Nov 26, 2012, at 8:41 PM, Steve Lianoglou <mailinglist.honeypot@gmail.com> wrote: > Hi, > > I'm not sure, but did you just realize that you used two different entrez id's for your test, eg: > > revmap(org.Hs.egENSEMBL2EG)[['1019']] > org.Hs.egENSEMBL[['11019']] > > Or do you think you are seeing a different problem? > > Not sure where you think "the truncating" is happening > > > > On Monday, November 26, 2012, John St. John wrote: > Foot-in-mouth, found the bug, my typo, I truncated a 1 in my test. > > http://www.ncbi.nlm.nih.gov/gene?term=11019 > > Looks like *maybe* the entrez ids are getting truncated in some of these databases or something, which could cause this problem. > > > Looks like the output is the same > > On Nov 26, 2012, at 8:18 PM, John St. John <johnthesaintjohn@gmail.com> wrote: > > > I have a gene for example, CDK4 http://www.ncbi.nlm.nih.gov/gene?term=1019, and I was testing its mapping between entrez ID and ENSEMBL id in R. It's entrez id is 1019. > > > > library(org.Hs.eg.db) > > > > revmap(org.Hs.egENSEMBL2EG)[['1019']] > > [1] "ENSG00000135446" > > > > #this appears to be the expected output (see http://www.ncbi.nlm.nih.gov/gene?term=1019) > > > > org.Hs.egENSEMBL[['11019']] > > [1] "ENSG00000121897" > > > > #wrong output, this is a different unrelated gene, LIAS > > > > org.Hs.egSYMBOL[['11019']] > > [1] "LIAS" > > > > #expected "CDK4" > > > > Help description for org.Hs.egENSEMBL: > > org.Hs.egENSEMBL is an R object that contains mappings between Entrez Gene identifiers and Ensembl gene accession numbers. > > > > Help description for org.Hs.egENSEMBL2EG (it is the same help message): > > org.Hs.egENSEMBL is an R object that contains mappings between Entrez Gene identifiers and Ensembl gene accession numbers. > > > > Help description for org.Hs.egSYMBOL: > > org.Hs.egSYMBOL is an R object that provides mappings between entrez gene identifiers and gene abbreviations. > > > > There is something wrong here, right? From the helps it looks like these different annotation databases do the mappings I expect, and should all work with entrez ids as input. The fact that one of these works, and the rest do not is concerning. Is there a more widely used, less buggy mapping between ENSEMBL and Entrez that I should be using instead of this one? > > > > Thanks, > > John > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 1106 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6