org.Hs.eg question
1
0
Entering edit mode
Sim, Fraser ▴ 350
@sim-fraser-2871
Last seen 10.2 years ago
Hi, I'm using the org.Hs.eg annotation package to convert Ensembl protein annotations to Entrez GeneIds. I don't understand why although I can find the correct annotation manually via the Ensembl website (EG = 4340), the annotation package is unable to. Here is the code: > HsENSP [1] "ENSP00000373017" > require("org.Hs.eg.db") > HsEG = as.character(unlist(mget(HsENSP, org.Hs.egENSEMBLPROT2EG, ifnotfound = NA))) > HsEG [1] NA Thanks for any input. Regards, Fraser > sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] gplots_2.6.0 gdata_2.4.2 gtools_2.5.0 [4] bioDist_1.14.0 RColorBrewer_1.0-2 GEOquery_2.6.0 [7] RCurl_0.94-0 rae230a.db_2.2.5 org.Rn.eg.db_2.2.6 [10] hom.Rn.inp.db_2.2.5 org.Hs.eg.db_2.2.6 RSQLite_0.7-1 [13] DBI_0.2-4 AnnotationDbi_1.4.2 Biobase_2.2.1 [16] rcom_2.0-4 rscproxy_1.0-12 >
Annotation rae230a convert Annotation rae230a convert • 1.2k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States
Hi Sim, I can explain this, and maybe you can even help me to improve things. The mappings for ensembl protein and transcript IDs are available mapped to ensembl gene IDs from ensembls web site (as mapped to ensembl gene IDs). And the mappings from entrez gene to ensembl gene IDs presently come from NCBI. However, the gene to gene mappings from NCBI do not seem to be as complete as whatever ensembl is using, and I do not have any explanation from them about why that is. I also don't have a better source for this information (yet) as I have been unable to locate this kind of information from ensembls FTP sites. Something must exist somewhere at ensembl though because the ensembl web site is presumably based on it. But whatever they are using at ensembl they do not seem to be sharing that mapping with the world (although it would be great to find out that I had just missed it somehow). If you know where I can find a better source for this kind of information than what I am currently using, I would be more than happy to consider it. But it obviously has to be from a trustworthy and documentable source (such as NCBI or ensembl). Otherwise there would not be much point in including it. ;) Marc Sim, Fraser wrote: > Hi, > > I'm using the org.Hs.eg annotation package to convert Ensembl protein > annotations to Entrez GeneIds. I don't understand why although I can > find the correct annotation manually via the Ensembl website (EG = > 4340), the annotation package is unable to. > > Here is the code: > >> HsENSP >> > [1] "ENSP00000373017" > >> require("org.Hs.eg.db") >> HsEG = as.character(unlist(mget(HsENSP, org.Hs.egENSEMBLPROT2EG, >> > ifnotfound = NA))) > >> HsEG >> > [1] NA > > Thanks for any input. > > Regards, > Fraser > > > >> sessionInfo() >> > R version 2.8.1 (2008-12-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > > [8] base > > other attached packages: > [1] gplots_2.6.0 gdata_2.4.2 gtools_2.5.0 > [4] bioDist_1.14.0 RColorBrewer_1.0-2 GEOquery_2.6.0 > [7] RCurl_0.94-0 rae230a.db_2.2.5 org.Rn.eg.db_2.2.6 > [10] hom.Rn.inp.db_2.2.5 org.Hs.eg.db_2.2.6 RSQLite_0.7-1 > [13] DBI_0.2-4 AnnotationDbi_1.4.2 Biobase_2.2.1 > [16] rcom_2.0-4 rscproxy_1.0-12 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD COMMENT
0
Entering edit mode
Hi Sim and Marc, You should forward this query to helpdesk at ensembl.org and someone will be able to help. Regards, Rhoda On 19 Mar 2009, at 19:15, Marc Carlson wrote: > Hi Sim, > > I can explain this, and maybe you can even help me to improve things. > The mappings for ensembl protein and transcript IDs are available > mapped > to ensembl gene IDs from ensembls web site (as mapped to ensembl gene > IDs). And the mappings from entrez gene to ensembl gene IDs presently > come from NCBI. > > However, the gene to gene mappings from NCBI do not seem to be as > complete as whatever ensembl is using, and I do not have any > explanation > from them about why that is. I also don't have a better source for > this > information (yet) as I have been unable to locate this kind of > information from ensembls FTP sites. Something must exist somewhere > at > ensembl though because the ensembl web site is presumably based on it. > But whatever they are using at ensembl they do not seem to be sharing > that mapping with the world (although it would be great to find out > that > I had just missed it somehow). If you know where I can find a better > source for this kind of information than what I am currently using, I > would be more than happy to consider it. But it obviously has to be > from a trustworthy and documentable source (such as NCBI or ensembl). > Otherwise there would not be much point in including it. ;) > > > Marc > > > > > Sim, Fraser wrote: >> Hi, >> >> I'm using the org.Hs.eg annotation package to convert Ensembl protein >> annotations to Entrez GeneIds. I don't understand why although I can >> find the correct annotation manually via the Ensembl website (EG = >> 4340), the annotation package is unable to. >> >> Here is the code: >> >>> HsENSP >>> >> [1] "ENSP00000373017" >> >>> require("org.Hs.eg.db") >>> HsEG = as.character(unlist(mget(HsENSP, org.Hs.egENSEMBLPROT2EG, >>> >> ifnotfound = NA))) >> >>> HsEG >>> >> [1] NA >> >> Thanks for any input. >> >> Regards, >> Fraser >> >> >> >>> sessionInfo() >>> >> R version 2.8.1 (2008-12-22) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >> States.1252;LC_MONETARY=English_United >> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] tools stats graphics grDevices utils datasets >> methods >> >> [8] base >> >> other attached packages: >> [1] gplots_2.6.0 gdata_2.4.2 gtools_2.5.0 >> [4] bioDist_1.14.0 RColorBrewer_1.0-2 GEOquery_2.6.0 >> [7] RCurl_0.94-0 rae230a.db_2.2.5 org.Rn.eg.db_2.2.6 >> [10] hom.Rn.inp.db_2.2.5 org.Hs.eg.db_2.2.6 RSQLite_0.7-1 >> [13] DBI_0.2-4 AnnotationDbi_1.4.2 Biobase_2.2.1 >> [16] rcom_2.0-4 rscproxy_1.0-12 >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Rhoda Kinsella Ph.D. Ensembl Bioinformatician, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.
ADD REPLY
0
Entering edit mode
Hi Marc, I tried the bioaRt package to look at the ensembl data directly. NCBI GeneID = 4340 - maps to ENSG00000204655 Ensembl Peptide = ENSP00000373017 - maps to ENSG00000206456 (no NCBI GeneID associated) So I think the mapping is working correctly as no NCBI GeneID is associated with this Ensembl gene. However, both have very similar annotations and appear to be the same 'MOG' gene but only one gets the NCBI GeneID. ENSG00000204655 maps to Chromosome 6: 29,732,788-29,748,128 ENSG00000206456 maps to Chromosome c6_COX: 29,768,660-29,784,001 I was using the org.Hs.eg.db package with hom.Rn.inp.db to find rat-human homologs. Actually if I use biomaRt to look for the homolog of rat geneId 24558, it successfully finds ENSG00000204655 (ie. GeneID 4340). I'll try that route and report my results. Cheers, Fraser -----Original Message----- From: Marc Carlson [mailto:mcarlson@fhcrc.org] Sent: Thursday, March 19, 2009 3:16 PM To: Sim, Fraser Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] org.Hs.eg question Hi Sim, I can explain this, and maybe you can even help me to improve things. The mappings for ensembl protein and transcript IDs are available mapped to ensembl gene IDs from ensembls web site (as mapped to ensembl gene IDs). And the mappings from entrez gene to ensembl gene IDs presently come from NCBI. However, the gene to gene mappings from NCBI do not seem to be as complete as whatever ensembl is using, and I do not have any explanation from them about why that is. I also don't have a better source for this information (yet) as I have been unable to locate this kind of information from ensembls FTP sites. Something must exist somewhere at ensembl though because the ensembl web site is presumably based on it. But whatever they are using at ensembl they do not seem to be sharing that mapping with the world (although it would be great to find out that I had just missed it somehow). If you know where I can find a better source for this kind of information than what I am currently using, I would be more than happy to consider it. But it obviously has to be from a trustworthy and documentable source (such as NCBI or ensembl). Otherwise there would not be much point in including it. ;) Marc Sim, Fraser wrote: > Hi, > > I'm using the org.Hs.eg annotation package to convert Ensembl protein > annotations to Entrez GeneIds. I don't understand why although I can > find the correct annotation manually via the Ensembl website (EG = > 4340), the annotation package is unable to. > > Here is the code: > >> HsENSP >> > [1] "ENSP00000373017" > >> require("org.Hs.eg.db") >> HsEG = as.character(unlist(mget(HsENSP, org.Hs.egENSEMBLPROT2EG, >> > ifnotfound = NA))) > >> HsEG >> > [1] NA > > Thanks for any input. > > Regards, > Fraser > > > >> sessionInfo() >> > R version 2.8.1 (2008-12-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > > [8] base > > other attached packages: > [1] gplots_2.6.0 gdata_2.4.2 gtools_2.5.0 > [4] bioDist_1.14.0 RColorBrewer_1.0-2 GEOquery_2.6.0 > [7] RCurl_0.94-0 rae230a.db_2.2.5 org.Rn.eg.db_2.2.6 > [10] hom.Rn.inp.db_2.2.5 org.Hs.eg.db_2.2.6 RSQLite_0.7-1 > [13] DBI_0.2-4 AnnotationDbi_1.4.2 Biobase_2.2.1 > [16] rcom_2.0-4 rscproxy_1.0-12 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY
0
Entering edit mode
Hi, Here are the results: 665 of 898 rat GeneIDs annotated with human GeneIDs using hom.Rn.inp package 752 of 898 using biomaRt package (slow) 784 of 898 using hom.Rn.inp then biomaRt approaches (very slow) Looks like the combined approach is best. This method annotates 87% of rat geneIds which is good enough for my purposes. Thanks, Fraser -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Sim, Fraser Sent: Friday, March 20, 2009 12:07 PM To: Marc Carlson Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] org.Hs.eg question Hi Marc, I tried the bioaRt package to look at the ensembl data directly. NCBI GeneID = 4340 - maps to ENSG00000204655 Ensembl Peptide = ENSP00000373017 - maps to ENSG00000206456 (no NCBI GeneID associated) So I think the mapping is working correctly as no NCBI GeneID is associated with this Ensembl gene. However, both have very similar annotations and appear to be the same 'MOG' gene but only one gets the NCBI GeneID. ENSG00000204655 maps to Chromosome 6: 29,732,788-29,748,128 ENSG00000206456 maps to Chromosome c6_COX: 29,768,660-29,784,001 I was using the org.Hs.eg.db package with hom.Rn.inp.db to find rat-human homologs. Actually if I use biomaRt to look for the homolog of rat geneId 24558, it successfully finds ENSG00000204655 (ie. GeneID 4340). I'll try that route and report my results. Cheers, Fraser -----Original Message----- From: Marc Carlson [mailto:mcarlson@fhcrc.org] Sent: Thursday, March 19, 2009 3:16 PM To: Sim, Fraser Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] org.Hs.eg question Hi Sim, I can explain this, and maybe you can even help me to improve things. The mappings for ensembl protein and transcript IDs are available mapped to ensembl gene IDs from ensembls web site (as mapped to ensembl gene IDs). And the mappings from entrez gene to ensembl gene IDs presently come from NCBI. However, the gene to gene mappings from NCBI do not seem to be as complete as whatever ensembl is using, and I do not have any explanation from them about why that is. I also don't have a better source for this information (yet) as I have been unable to locate this kind of information from ensembls FTP sites. Something must exist somewhere at ensembl though because the ensembl web site is presumably based on it. But whatever they are using at ensembl they do not seem to be sharing that mapping with the world (although it would be great to find out that I had just missed it somehow). If you know where I can find a better source for this kind of information than what I am currently using, I would be more than happy to consider it. But it obviously has to be from a trustworthy and documentable source (such as NCBI or ensembl). Otherwise there would not be much point in including it. ;) Marc Sim, Fraser wrote: > Hi, > > I'm using the org.Hs.eg annotation package to convert Ensembl protein > annotations to Entrez GeneIds. I don't understand why although I can > find the correct annotation manually via the Ensembl website (EG = > 4340), the annotation package is unable to. > > Here is the code: > >> HsENSP >> > [1] "ENSP00000373017" > >> require("org.Hs.eg.db") >> HsEG = as.character(unlist(mget(HsENSP, org.Hs.egENSEMBLPROT2EG, >> > ifnotfound = NA))) > >> HsEG >> > [1] NA > > Thanks for any input. > > Regards, > Fraser > > > >> sessionInfo() >> > R version 2.8.1 (2008-12-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > > [8] base > > other attached packages: > [1] gplots_2.6.0 gdata_2.4.2 gtools_2.5.0 > [4] bioDist_1.14.0 RColorBrewer_1.0-2 GEOquery_2.6.0 > [7] RCurl_0.94-0 rae230a.db_2.2.5 org.Rn.eg.db_2.2.6 > [10] hom.Rn.inp.db_2.2.5 org.Hs.eg.db_2.2.6 RSQLite_0.7-1 > [13] DBI_0.2-4 AnnotationDbi_1.4.2 Biobase_2.2.1 > [16] rcom_2.0-4 rscproxy_1.0-12 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 837 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6