unable to find known entrezgene with biomaRt
1
0
Entering edit mode
Dick Beyer ★ 1.4k
@dick-beyer-26
Last seen 9.6 years ago
Hello, I am unable to find some Entrez Gene IDs in the ensembl homo sapiens database via biomaRt, even though I can access them via the ensembl web. library(biomaRt) mart <- useMart( "ensembl", dataset="hsapiens_gene_ensembl") getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"),filte rs="entrezgene",values=3845, mart=mart) entrezgene hgnc_symbol ensembl_gene_id 1 3845 KRAS ENSG00000133703 getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"),filte rs="entrezgene",values=3514, mart=mart) NULL The ensembl web interface: http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000211592 shows Entrez Gene ID 3514 corresponds to ensembl_gene_id ENSG00000211592, IGKC. I'm curious why my biomaRt session will return good results for some valid Entrez Gene IDs but not for others. I'm not sure what to try next. I'd very much appreciate any help. sessionInfo() R version 2.6.1 (2007-11-26) x86_64-redhat-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US. UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8 ;LC_IDENTIFICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] topGO_1.4.0 SparseM_0.75 AnnotationDbi_1.0.6 [4] RSQLite_0.6-4 DBI_0.2-4 GO_2.0.1 [7] Biobase_1.16.2 graph_1.16.1 biomaRt_1.12.2 [10] RCurl_0.8-3 loaded via a namespace (and not attached): [1] cluster_1.11.9 rcompgen_0.1-17 XML_1.93-2 Thanks much, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer
biomaRt biomaRt • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 minutes ago
United States
Hi Dick, I'm not sure I understand your question. When I go to the webpage you reference, there is AFAICT no mention of this gene being the same as Entrez Gene 3514 (other than having the same symbol). Nor does Entrez Gene mention that it is the same as Ensembl Gene ENSG00000211592. A quick look at the location of the gene would imply that it probably is the same, and not two genes that have the same symbol (which is not unique). Since both the web interface and the programmatic interface agree, this isn't a matter of inconsistencies between the interfaces, so perhaps the question is why do Entrez Gene and Ensembl not reference each other? If so, this I think is simply due to the fact that you have two different groups that are doing the annotation, and they are not always perfect at referencing each other. Best, Jim Dick Beyer wrote: > Hello, > > I am unable to find some Entrez Gene IDs in the ensembl homo sapiens database via biomaRt, even though I can access them via the ensembl web. > > library(biomaRt) > mart <- useMart( "ensembl", dataset="hsapiens_gene_ensembl") > > getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"),fil ters="entrezgene",values=3845, mart=mart) > entrezgene hgnc_symbol ensembl_gene_id > 1 3845 KRAS ENSG00000133703 > > getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"),fil ters="entrezgene",values=3514, mart=mart) > NULL > > The ensembl web interface: > > http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000211592 > > shows Entrez Gene ID 3514 corresponds to ensembl_gene_id ENSG00000211592, IGKC. > > I'm curious why my biomaRt session will return good results for some valid Entrez Gene IDs but not for others. I'm not sure what to try next. I'd very much appreciate any help. > > sessionInfo() > R version 2.6.1 (2007-11-26) > x86_64-redhat-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_U S.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF -8;LC_IDENTIFICATION=C > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] topGO_1.4.0 SparseM_0.75 AnnotationDbi_1.0.6 > [4] RSQLite_0.6-4 DBI_0.2-4 GO_2.0.1 > [7] Biobase_1.16.2 graph_1.16.1 biomaRt_1.12.2 > [10] RCurl_0.8-3 > > loaded via a namespace (and not attached): > [1] cluster_1.11.9 rcompgen_0.1-17 XML_1.93-2 > > Thanks much, > Dick > ******************************************************************** *********** > Richard P. Beyer, Ph.D. University of Washington > Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 > Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 > Seattle, WA 98105-6099 > http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html > http://staff.washington.edu/~dbeyer > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Jim, Thanks for explaining this to me. I had assumed that if the gene was in ensembl, then I could get other bits of info such as Entrez Gene ID and such. Is there some bioconductor way, similar to biomaRt, to access this Entrez Gene ID? What I am really using the getBM call for is just to get a gene symbol and a gene description given the Entrez Gene ID. Thanks very much, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer ********************************************************************** ********* On Sat, 19 Jan 2008, James W. MacDonald wrote: > Hi Dick, > > I'm not sure I understand your question. When I go to the webpage you > reference, there is AFAICT no mention of this gene being the same as Entrez > Gene 3514 (other than having the same symbol). Nor does Entrez Gene mention > that it is the same as Ensembl Gene ENSG00000211592. > > A quick look at the location of the gene would imply that it probably is the > same, and not two genes that have the same symbol (which is not unique). > > Since both the web interface and the programmatic interface agree, this isn't a > matter of inconsistencies between the interfaces, so perhaps the question is > why do Entrez Gene and Ensembl not reference each other? > > If so, this I think is simply due to the fact that you have two different > groups that are doing the annotation, and they are not always perfect at > referencing each other. > > Best, > > Jim > > > > Dick Beyer wrote: >> Hello, >> >> I am unable to find some Entrez Gene IDs in the ensembl homo sapiens >> database via biomaRt, even though I can access them via the ensembl web. >> >> library(biomaRt) >> mart <- useMart( "ensembl", dataset="hsapiens_gene_ensembl") >> >> getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"),fi lters="entrezgene",values=3845, >> mart=mart) >> entrezgene hgnc_symbol ensembl_gene_id >> 1 3845 KRAS ENSG00000133703 >> >> getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"),fi lters="entrezgene",values=3514, >> mart=mart) >> NULL >> >> The ensembl web interface: >> >> http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000211592 >> >> shows Entrez Gene ID 3514 corresponds to ensembl_gene_id ENSG00000211592, >> IGKC. >> >> I'm curious why my biomaRt session will return good results for some valid >> Entrez Gene IDs but not for others. I'm not sure what to try next. I'd >> very much appreciate any help. >> >> sessionInfo() >> R version 2.6.1 (2007-11-26) >> x86_64-redhat-linux-gnu >> >> locale: >> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en _US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_ US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UT F-8;LC_IDENTIFICATION=C >> >> attached base packages: >> [1] tools stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] topGO_1.4.0 SparseM_0.75 AnnotationDbi_1.0.6 >> [4] RSQLite_0.6-4 DBI_0.2-4 GO_2.0.1 >> [7] Biobase_1.16.2 graph_1.16.1 biomaRt_1.12.2 >> [10] RCurl_0.8-3 >> >> loaded via a namespace (and not attached): >> [1] cluster_1.11.9 rcompgen_0.1-17 XML_1.93-2 >> >> Thanks much, >> Dick >> ******************************************************************* ************ >> Richard P. Beyer, Ph.D. University of Washington >> Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 >> Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 >> Seattle, WA 98105-6099 >> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >> http://staff.washington.edu/~dbeyer >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi Dick, For most of the ids, your mapping from entrezgene to hgnc_symbol should work well in Ensembl and correspond to the Entrezgene annotation. However as Jim mentioned these are two independent groups and e.g. Ensembl updates every two months so if something changed in Entrezgene in those two months you'll need to wait until the next Ensembl release for this to be visible. Checking EntrezGene for 3514 gives on the upper right hand side "updated 06-Jan-2008" So this entry was updated (or maybe even just added) after the last Ensembl release (December 2007) which probably explains why it's not in Ensembl yet. Cheers, Steffen ----- Original Message ----- From: Dick Beyer <dbeyer@u.washington.edu> Date: Saturday, January 19, 2008 8:13 am Subject: Re: [BioC] unable to find known entrezgene with biomaRt To: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> Cc: Bioconductor <bioconductor at="" stat.math.ethz.ch=""> > Hi Jim, > > Thanks for explaining this to me. I had assumed that if the gene > was in ensembl, then I could get other bits of info such as Entrez > Gene ID and such. > > Is there some bioconductor way, similar to biomaRt, to access this > Entrez Gene ID? What I am really using the getBM call for is just > to get a gene symbol and a gene description given the Entrez Gene ID. > > Thanks very much, > Dick > ******************************************************************** *********** > Richard P. Beyer, Ph.D. University of Washington > Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 > Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 > Seattle, WA 98105-6099 > http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html > http://staff.washington.edu/~dbeyer > ******************************************************************** *********** > > On Sat, 19 Jan 2008, James W. MacDonald wrote: > > > Hi Dick, > > > > I'm not sure I understand your question. When I go to the webpage > you > > reference, there is AFAICT no mention of this gene being the same > as Entrez > > Gene 3514 (other than having the same symbol). Nor does Entrez > Gene mention > > that it is the same as Ensembl Gene ENSG00000211592. > > > > A quick look at the location of the gene would imply that it > probably is the > > same, and not two genes that have the same symbol (which is not > unique).> > > Since both the web interface and the programmatic interface > agree, this isn't a > > matter of inconsistencies between the interfaces, so perhaps the > question is > > why do Entrez Gene and Ensembl not reference each other? > > > > If so, this I think is simply due to the fact that you have two > different > > groups that are doing the annotation, and they are not always > perfect at > > referencing each other. > > > > Best, > > > > Jim > > > > > > > > Dick Beyer wrote: > >> Hello, > >> > >> I am unable to find some Entrez Gene IDs in the ensembl homo > sapiens > >> database via biomaRt, even though I can access them via the > ensembl web. > >> > >> library(biomaRt) > >> mart <- useMart( "ensembl", dataset="hsapiens_gene_ensembl") > >> > >> > getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"),fil ters="entrezgene",values=3845, > >> mart=mart) > >> entrezgene hgnc_symbol ensembl_gene_id > >> 1 3845 KRAS ENSG00000133703 > >> > >> > getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"),fil ters="entrezgene",values=3514, > >> mart=mart) > >> NULL > >> > >> The ensembl web interface: > >> > >> http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000211592 > >> > >> shows Entrez Gene ID 3514 corresponds to ensembl_gene_id > ENSG00000211592, > >> IGKC. > >> > >> I'm curious why my biomaRt session will return good results for > some valid > >> Entrez Gene IDs but not for others. I'm not sure what to try > next. I'd > >> very much appreciate any help. > >> > >> sessionInfo() > >> R version 2.6.1 (2007-11-26) > >> x86_64-redhat-linux-gnu > >> > >> locale: > >> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF- > 8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF- > 8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF- > 8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;L C_IDENTIFICATION=C > >> > >> attached base packages: > >> [1] tools stats graphics grDevices utils datasets > methods>> [8] base > >> > >> other attached packages: > >> [1] topGO_1.4.0 SparseM_0.75 AnnotationDbi_1.0.6 > >> [4] RSQLite_0.6-4 DBI_0.2-4 GO_2.0.1 > >> [7] Biobase_1.16.2 graph_1.16.1 biomaRt_1.12.2 > >> [10] RCurl_0.8-3 > >> > >> loaded via a namespace (and not attached): > >> [1] cluster_1.11.9 rcompgen_0.1-17 XML_1.93-2 > >> > >> Thanks much, > >> Dick > >> > ******************************************************************** ***********>> Richard P. Beyer, Ph.D. University of Washington > >> Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 > >> Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 > >> Seattle, WA 98105-6099 > >> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html > >> http://staff.washington.edu/~dbeyer > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Steffen, Thanks very much for your help. I am definitely learning here from you and Jim. Cheers, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer ********************************************************************** ********* On Sat, 19 Jan 2008, Steffen Durinck wrote: > Hi Dick, > > For most of the ids, your mapping from entrezgene to hgnc_symbol should work well in Ensembl and correspond to the Entrezgene annotation. However as Jim mentioned these are two independent groups and e.g. Ensembl updates every two months so if something changed in Entrezgene in those two months you'll need to wait until the next Ensembl release for this to be visible. Checking EntrezGene for 3514 gives on the upper right hand side "updated 06-Jan-2008" So this entry was updated (or maybe even just added) after the last Ensembl release (December 2007) which probably explains why it's not in Ensembl yet. > > Cheers, > Steffen > > ----- Original Message ----- > From: Dick Beyer <dbeyer at="" u.washington.edu=""> > Date: Saturday, January 19, 2008 8:13 am > Subject: Re: [BioC] unable to find known entrezgene with biomaRt > To: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> > Cc: Bioconductor <bioconductor at="" stat.math.ethz.ch=""> > >> Hi Jim, >> >> Thanks for explaining this to me. I had assumed that if the gene >> was in ensembl, then I could get other bits of info such as Entrez >> Gene ID and such. >> >> Is there some bioconductor way, similar to biomaRt, to access this >> Entrez Gene ID? What I am really using the getBM call for is just >> to get a gene symbol and a gene description given the Entrez Gene ID. >> >> Thanks very much, >> Dick >> ******************************************************************* ************ >> Richard P. Beyer, Ph.D. University of Washington >> Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 >> Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 >> Seattle, WA 98105-6099 >> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >> http://staff.washington.edu/~dbeyer >> ******************************************************************* ************ >> >> On Sat, 19 Jan 2008, James W. MacDonald wrote: >> >>> Hi Dick, >>> >>> I'm not sure I understand your question. When I go to the webpage >> you >>> reference, there is AFAICT no mention of this gene being the same >> as Entrez >>> Gene 3514 (other than having the same symbol). Nor does Entrez >> Gene mention >>> that it is the same as Ensembl Gene ENSG00000211592. >>> >>> A quick look at the location of the gene would imply that it >> probably is the >>> same, and not two genes that have the same symbol (which is not >> unique).> >>> Since both the web interface and the programmatic interface >> agree, this isn't a >>> matter of inconsistencies between the interfaces, so perhaps the >> question is >>> why do Entrez Gene and Ensembl not reference each other? >>> >>> If so, this I think is simply due to the fact that you have two >> different >>> groups that are doing the annotation, and they are not always >> perfect at >>> referencing each other. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> Dick Beyer wrote: >>>> Hello, >>>> >>>> I am unable to find some Entrez Gene IDs in the ensembl homo >> sapiens >>>> database via biomaRt, even though I can access them via the >> ensembl web. >>>> >>>> library(biomaRt) >>>> mart <- useMart( "ensembl", dataset="hsapiens_gene_ensembl") >>>> >>>> >> getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"),fi lters="entrezgene",values=3845, >>>> mart=mart) >>>> entrezgene hgnc_symbol ensembl_gene_id >>>> 1 3845 KRAS ENSG00000133703 >>>> >>>> >> getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"),fi lters="entrezgene",values=3514, >>>> mart=mart) >>>> NULL >>>> >>>> The ensembl web interface: >>>> >>>> http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000211592 >>>> >>>> shows Entrez Gene ID 3514 corresponds to ensembl_gene_id >> ENSG00000211592, >>>> IGKC. >>>> >>>> I'm curious why my biomaRt session will return good results for >> some valid >>>> Entrez Gene IDs but not for others. I'm not sure what to try >> next. I'd >>>> very much appreciate any help. >>>> >>>> sessionInfo() >>>> R version 2.6.1 (2007-11-26) >>>> x86_64-redhat-linux-gnu >>>> >>>> locale: >>>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF- >> 8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF- >> 8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF- >> 8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8; LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] tools stats graphics grDevices utils datasets >> methods>> [8] base >>>> >>>> other attached packages: >>>> [1] topGO_1.4.0 SparseM_0.75 AnnotationDbi_1.0.6 >>>> [4] RSQLite_0.6-4 DBI_0.2-4 GO_2.0.1 >>>> [7] Biobase_1.16.2 graph_1.16.1 biomaRt_1.12.2 >>>> [10] RCurl_0.8-3 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] cluster_1.11.9 rcompgen_0.1-17 XML_1.93-2 >>>> >>>> Thanks much, >>>> Dick >>>> >> ******************************************************************* ************>> Richard P. Beyer, Ph.D. University of Washington >>>> Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 >>>> Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 >>>> Seattle, WA 98105-6099 >>>> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >>>> http://staff.washington.edu/~dbeyer >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi Dick, What information are you starting with? Do you just need the gene symbol and description? If you have the Entrez Gene ID it is really simple. > library(org.Hs.eg.db) > get("3514", org.Hs.egSYMBOL) [1] "IGKC" > get("3514", org.Hs.egGENENAME) [1] "immunoglobulin kappa constant" If you have multiple IDs, then of course you need to use mget() and then wrangle the resulting lists into whatever shape you need. An alternative with the sweet new SQLite db format (thanks to the friendly folks in Seattle) is to dump everything out and then subset from there. > ids <- ls(org.Hs.egSYMBOL)[1:10] ##some random IDs > thesymbs <- toTable(org.Hs.egSYMBOL) ##dump > thesymbs[thesymbs[,1] %in% ids,] gene_id symbol 1 1 A1BG 2 2 A2M 3 9 NAT1 4 10 NAT2 5 12 SERPINA3 6 13 AADAC 7 14 AAMP 8 15 AANAT 9 16 AARS 10 18 ABAT If you have the Ensembl ID I would use biomaRt. > getBM(c("hgnc_symbol", "description"), "ensembl_gene_id", "ENSG00000211592",mart=mart, output="list") $hgnc_symbol $hgnc_symbol$ENSG00000211592 [1] NA $description $description$ENSG00000211592 [1] "Immunoglobulin Kappa light chain C gene segment [Source:IMGT/GENE_DB;Acc:IGKC]" As noted before, the information from the two sources doesn't always agree 100%, which is sorta weird in this case since the description field from Ensembl _does_ contain the gene symbol. Anyway I hope that helps. Best, Jim Dick Beyer wrote: > Hi Jim, > > Thanks for explaining this to me. I had assumed that if the gene was in ensembl, then I could get other bits of info such as Entrez Gene ID and such. > > Is there some bioconductor way, similar to biomaRt, to access this Entrez Gene ID? What I am really using the getBM call for is just to get a gene symbol and a gene description given the Entrez Gene ID. > > Thanks very much, > Dick > ******************************************************************** *********** > Richard P. Beyer, Ph.D. University of Washington > Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 > Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 > Seattle, WA 98105-6099 > http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html > http://staff.washington.edu/~dbeyer > ******************************************************************** *********** > > On Sat, 19 Jan 2008, James W. MacDonald wrote: > >> Hi Dick, >> >> I'm not sure I understand your question. When I go to the webpage you >> reference, there is AFAICT no mention of this gene being the same as Entrez >> Gene 3514 (other than having the same symbol). Nor does Entrez Gene mention >> that it is the same as Ensembl Gene ENSG00000211592. >> >> A quick look at the location of the gene would imply that it probably is the >> same, and not two genes that have the same symbol (which is not unique). >> >> Since both the web interface and the programmatic interface agree, this isn't a >> matter of inconsistencies between the interfaces, so perhaps the question is >> why do Entrez Gene and Ensembl not reference each other? >> >> If so, this I think is simply due to the fact that you have two different >> groups that are doing the annotation, and they are not always perfect at >> referencing each other. >> >> Best, >> >> Jim >> >> >> >> Dick Beyer wrote: >>> Hello, >>> >>> I am unable to find some Entrez Gene IDs in the ensembl homo sapiens >>> database via biomaRt, even though I can access them via the ensembl web. >>> >>> library(biomaRt) >>> mart <- useMart( "ensembl", dataset="hsapiens_gene_ensembl") >>> >>> getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"),f ilters="entrezgene",values=3845, >>> mart=mart) >>> entrezgene hgnc_symbol ensembl_gene_id >>> 1 3845 KRAS ENSG00000133703 >>> >>> getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"),f ilters="entrezgene",values=3514, >>> mart=mart) >>> NULL >>> >>> The ensembl web interface: >>> >>> http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000211592 >>> >>> shows Entrez Gene ID 3514 corresponds to ensembl_gene_id ENSG00000211592, >>> IGKC. >>> >>> I'm curious why my biomaRt session will return good results for some valid >>> Entrez Gene IDs but not for others. I'm not sure what to try next. I'd >>> very much appreciate any help. >>> >>> sessionInfo() >>> R version 2.6.1 (2007-11-26) >>> x86_64-redhat-linux-gnu >>> >>> locale: >>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=e n_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en _US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.U TF-8;LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] tools stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] topGO_1.4.0 SparseM_0.75 AnnotationDbi_1.0.6 >>> [4] RSQLite_0.6-4 DBI_0.2-4 GO_2.0.1 >>> [7] Biobase_1.16.2 graph_1.16.1 biomaRt_1.12.2 >>> [10] RCurl_0.8-3 >>> >>> loaded via a namespace (and not attached): >>> [1] cluster_1.11.9 rcompgen_0.1-17 XML_1.93-2 >>> >>> Thanks much, >>> Dick >>> ****************************************************************** ************* >>> Richard P. Beyer, Ph.D. University of Washington >>> Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 >>> Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 >>> Seattle, WA 98105-6099 >>> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >>> http://staff.washington.edu/~dbeyer >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, MS Biostatistician UMCCC cDNA and Affymetrix Core University of Michigan 1500 E Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD REPLY
0
Entering edit mode
Hi Jim, As always, you offer great help. Thanks for taking the time to give me examples as well. That is much appreciated. I wasn't aware of org.Hs.eg.db. That seems like exactly what I should be using for what I need. I guess I should start paying closer attention to BioC announcements :-) Cheers, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer ********************************************************************** ********* On Sat, 19 Jan 2008, James MacDonald wrote: > Hi Dick, > > What information are you starting with? Do you just need the gene symbol and > description? > > If you have the Entrez Gene ID it is really simple. > >> library(org.Hs.eg.db) >> get("3514", org.Hs.egSYMBOL) > [1] "IGKC" >> get("3514", org.Hs.egGENENAME) > [1] "immunoglobulin kappa constant" > > If you have multiple IDs, then of course you need to use mget() and then > wrangle the resulting lists into whatever shape you need. An alternative with > the sweet new SQLite db format (thanks to the friendly folks in Seattle) is to > dump everything out and then subset from there. > >> ids <- ls(org.Hs.egSYMBOL)[1:10] ##some random IDs >> thesymbs <- toTable(org.Hs.egSYMBOL) ##dump >> thesymbs[thesymbs[,1] %in% ids,] > gene_id symbol > 1 1 A1BG > 2 2 A2M > 3 9 NAT1 > 4 10 NAT2 > 5 12 SERPINA3 > 6 13 AADAC > 7 14 AAMP > 8 15 AANAT > 9 16 AARS > 10 18 ABAT > > If you have the Ensembl ID I would use biomaRt. > >> getBM(c("hgnc_symbol", "description"), "ensembl_gene_id", > "ENSG00000211592",mart=mart, output="list") > $hgnc_symbol > $hgnc_symbol$ENSG00000211592 > [1] NA > > > $description > $description$ENSG00000211592 > [1] "Immunoglobulin Kappa light chain C gene segment > [Source:IMGT/GENE_DB;Acc:IGKC]" > > As noted before, the information from the two sources doesn't always agree > 100%, which is sorta weird in this case since the description field from > Ensembl _does_ contain the gene symbol. > > Anyway I hope that helps. > > > Best, > > Jim > > > > Dick Beyer wrote: >> Hi Jim, >> >> Thanks for explaining this to me. I had assumed that if the gene was in >> ensembl, then I could get other bits of info such as Entrez Gene ID and >> such. >> >> Is there some bioconductor way, similar to biomaRt, to access this Entrez >> Gene ID? What I am really using the getBM call for is just to get a gene >> symbol and a gene description given the Entrez Gene ID. >> >> Thanks very much, >> Dick >> ******************************************************************* ************ >> Richard P. Beyer, Ph.D. University of Washington >> Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 >> Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 >> Seattle, WA 98105-6099 >> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >> http://staff.washington.edu/~dbeyer >> ******************************************************************* ************ >> >> On Sat, 19 Jan 2008, James W. MacDonald wrote: >> >>> Hi Dick, >>> >>> I'm not sure I understand your question. When I go to the webpage you >>> reference, there is AFAICT no mention of this gene being the same as >>> Entrez Gene 3514 (other than having the same symbol). Nor does Entrez Gene >>> mention that it is the same as Ensembl Gene ENSG00000211592. >>> >>> A quick look at the location of the gene would imply that it probably is >>> the same, and not two genes that have the same symbol (which is not >>> unique). >>> >>> Since both the web interface and the programmatic interface agree, this >>> isn't a matter of inconsistencies between the interfaces, so perhaps the >>> question is why do Entrez Gene and Ensembl not reference each other? >>> >>> If so, this I think is simply due to the fact that you have two different >>> groups that are doing the annotation, and they are not always perfect at >>> referencing each other. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> Dick Beyer wrote: >>>> Hello, >>>> >>>> I am unable to find some Entrez Gene IDs in the ensembl homo sapiens >>>> database via biomaRt, even though I can access them via the ensembl web. >>>> >>>> library(biomaRt) >>>> mart <- useMart( "ensembl", dataset="hsapiens_gene_ensembl") >>>> >>>> getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"), filters="entrezgene",values=3845, >>>> mart=mart) >>>> entrezgene hgnc_symbol ensembl_gene_id >>>> 1 3845 KRAS ENSG00000133703 >>>> >>>> getBM(attributes=c("entrezgene","hgnc_symbol","ensembl_gene_id"), filters="entrezgene",values=3514, >>>> mart=mart) >>>> NULL >>>> >>>> The ensembl web interface: >>>> >>>> http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000211592 >>>> >>>> shows Entrez Gene ID 3514 corresponds to ensembl_gene_id >>>> ENSG00000211592, IGKC. >>>> >>>> I'm curious why my biomaRt session will return good results for some >>>> valid Entrez Gene IDs but not for others. I'm not sure what to try >>>> next. I'd very much appreciate any help. >>>> >>>> sessionInfo() >>>> R version 2.6.1 (2007-11-26) >>>> x86_64-redhat-linux-gnu >>>> >>>> locale: >>>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE= en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=e n_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US. UTF-8;LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] tools stats graphics grDevices utils datasets methods >>>> [8] base >>>> >>>> other attached packages: >>>> [1] topGO_1.4.0 SparseM_0.75 AnnotationDbi_1.0.6 >>>> [4] RSQLite_0.6-4 DBI_0.2-4 GO_2.0.1 >>>> [7] Biobase_1.16.2 graph_1.16.1 biomaRt_1.12.2 >>>> [10] RCurl_0.8-3 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] cluster_1.11.9 rcompgen_0.1-17 XML_1.93-2 >>>> >>>> Thanks much, >>>> Dick >>>> ***************************************************************** ************** >>>> Richard P. Beyer, Ph.D. University of Washington >>>> Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 >>>> Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 >>>> Seattle, WA 98105-6099 >>>> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >>>> http://staff.washington.edu/~dbeyer >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, MS > Biostatistician > UMCCC cDNA and Affymetrix Core > University of Michigan > 1500 E Medical Center Drive > 7410 CCGC > Ann Arbor MI 48109 > 734-647-5623 >
ADD REPLY

Login before adding your answer.

Traffic: 912 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6