annotation - biomaRt - getBM - multiple entrez ID for one ensembl ID
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.2 years ago
Hello, I am currently analyzing data from an exon array. After pre-processing with RMA, with which I obtain a eSet with ensembl IDs, I would like to annotate the gene with Entrez ID. I am using getBM function with as input the ensembl gene ID and as output the entrez gene ID. Here is a part of the code I am using : mart <- useMart("ensembl", dataset = "hsapiens_gene_ensembl") gene2genomeEx <- getBM(values = ex, filters = "ensembl_gene_id", mart = mart, attributes = c("ensembl_gene_id", "entrezgene","hgnc_symbol", "external_gene_id", "external_gene_db", "description", "chromosome_name", "strand")) However for several genes (and a lot of histone genes), I obtain several entrez IDs for the same ensembl ID for example for : ex <- c("ENSG00000215417", "ENSG00000224078", "ENSG00000198366", "ENSG00000196176", "ENSG00000166012", "ENSG00000158406", "ENSG00000196787"), I obtain : ensembl_gene_id entrezgene hgnc_symbol external_gene_id external_gene_db 1 ENSG00000158406 8294 HIST1H4H HIST1H4H HGNC Symbol 2 ENSG00000158406 8359 HIST1H4H HIST1H4H HGNC Symbol 3 ENSG00000158406 8360 HIST1H4H HIST1H4H HGNC Symbol 4 ENSG00000158406 8361 HIST1H4H HIST1H4H HGNC Symbol 5 ENSG00000158406 8362 HIST1H4H HIST1H4H HGNC Symbol 6 ENSG00000158406 8363 HIST1H4H HIST1H4H HGNC Symbol 7 ENSG00000158406 8364 HIST1H4H HIST1H4H HGNC Symbol 8 ENSG00000158406 8365 HIST1H4H HIST1H4H HGNC Symbol 9 ENSG00000158406 8366 HIST1H4H HIST1H4H HGNC Symbol 10 ENSG00000158406 8367 HIST1H4H HIST1H4H HGNC Symbol 11 ENSG00000158406 8368 HIST1H4H HIST1H4H HGNC Symbol 12 ENSG00000158406 8370 HIST1H4H HIST1H4H HGNC Symbol 13 ENSG00000158406 121504 HIST1H4H HIST1H4H HGNC Symbol 14 ENSG00000158406 554313 HIST1H4H HIST1H4H HGNC Symbol 15 ENSG00000166012 79101 TAF1D TAF1D HGNC Symbol 16 ENSG00000166012 654320 TAF1D TAF1D HGNC Symbol 17 ENSG00000166012 677792 TAF1D TAF1D HGNC Symbol 18 ENSG00000166012 677805 TAF1D TAF1D HGNC Symbol 19 ENSG00000166012 677822 TAF1D TAF1D HGNC Symbol 20 ENSG00000166012 692063 TAF1D TAF1D HGNC Symbol 21 ENSG00000166012 692072 TAF1D TAF1D HGNC Symbol 22 ENSG00000166012 100302240 TAF1D TAF1D HGNC Symbol 23 ENSG00000196176 8294 HIST1H4A HIST1H4A HGNC Symbol 24 ENSG00000196176 8359 HIST1H4A HIST1H4A HGNC Symbol 25 ENSG00000196176 8360 HIST1H4A HIST1H4A HGNC Symbol 26 ENSG00000196176 8361 HIST1H4A HIST1H4A HGNC Symbol 27 ENSG00000196176 8362 HIST1H4A HIST1H4A HGNC Symbol 28 ENSG00000196176 8363 HIST1H4A HIST1H4A HGNC Symbol 29 ENSG00000196176 8364 HIST1H4A HIST1H4A HGNC Symbol 30 ENSG00000196176 8365 HIST1H4A HIST1H4A HGNC Symbol 31 ENSG00000196176 8366 HIST1H4A HIST1H4A HGNC Symbol 32 ENSG00000196176 8367 HIST1H4A HIST1H4A HGNC Symbol 33 ENSG00000196176 8368 HIST1H4A HIST1H4A HGNC Symbol 34 ENSG00000196176 8370 HIST1H4A HIST1H4A HGNC Symbol 35 ENSG00000196176 121504 HIST1H4A HIST1H4A HGNC Symbol 36 ENSG00000196176 554313 HIST1H4A HIST1H4A HGNC Symbol 37 ENSG00000196787 8329 HIST1H2AG HIST1H2AG HGNC Symbol 38 ENSG00000196787 8330 HIST1H2AG HIST1H2AG HGNC Symbol 39 ENSG00000196787 8332 HIST1H2AG HIST1H2AG HGNC Symbol 40 ENSG00000196787 8336 HIST1H2AG HIST1H2AG HGNC Symbol 41 ENSG00000196787 8969 HIST1H2AG HIST1H2AG HGNC Symbol 42 ENSG00000196787 85235 HIST1H2AG HIST1H2AG HGNC Symbol 43 ENSG00000198366 8350 HIST1H3A HIST1H3A HGNC Symbol 44 ENSG00000198366 8351 HIST1H3A HIST1H3A HGNC Symbol 45 ENSG00000198366 8352 HIST1H3A HIST1H3A HGNC Symbol 46 ENSG00000198366 8353 HIST1H3A HIST1H3A HGNC Symbol 47 ENSG00000198366 8354 HIST1H3A HIST1H3A HGNC Symbol 48 ENSG00000198366 8355 HIST1H3A HIST1H3A HGNC Symbol 49 ENSG00000198366 8356 HIST1H3A HIST1H3A HGNC Symbol 50 ENSG00000198366 8357 HIST1H3A HIST1H3A HGNC Symbol 51 ENSG00000198366 8358 HIST1H3A HIST1H3A HGNC Symbol 52 ENSG00000198366 8968 HIST1H3A HIST1H3A HGNC Symbol 53 ENSG00000215417 406952 MIR17HG MIR17HG HGNC Symbol 54 ENSG00000215417 406953 MIR17HG MIR17HG HGNC Symbol 55 ENSG00000215417 406979 MIR17HG MIR17HG HGNC Symbol 56 ENSG00000215417 406980 MIR17HG MIR17HG HGNC Symbol 57 ENSG00000215417 406982 MIR17HG MIR17HG HGNC Symbol 58 ENSG00000215417 407048 MIR17HG MIR17HG HGNC Symbol 59 ENSG00000215417 407975 MIR17HG MIR17HG HGNC Symbol 60 ENSG00000224078 91380 SNHG14 SNHG14 HGNC Symbol 61 ENSG00000224078 100033444 SNHG14 SNHG14 HGNC Symbol 62 ENSG00000224078 100033450 SNHG14 SNHG14 HGNC Symbol 63 ENSG00000224078 100033802 SNHG14 SNHG14 HGNC Symbol 64 ENSG00000224078 100033820 SNHG14 SNHG14 HGNC Symbol 65 ENSG00000224078 100506948 SNHG14 SNHG14 HGNC Symbol The description, chromosome_name and strand are the same for each ensembl gene ID. I checked manually for the entrez ID which corresponds to the ensembl ID in ensembl.org, and I found only one entrezID for each gene. Does anyone knows where this problem come from? Is it linked to the nature of my request? Thanks in advance for your help, Yours sincerely, Laure Cougnaud -- output of sessionInfo(): R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.12.0 affy_1.34.0 Biobase_2.16.0 BiocGenerics_0.2.0 rj_1.1.0-4 loaded via a namespace (and not attached): [1] affyio_1.24.0 BiocInstaller_1.4.7 preprocessCore_1.18.0 RCurl_1.91-1 rj.gd_1.1.0-1 tools_2.15.1 [7] XML_3.9-4 zlibbioc_1.2.0 -- Sent via the guest posting facility at bioconductor.org.
• 6.9k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States
Hi Laure, On 5/3/2013 2:43 AM, Laure Cougnaud [guest] wrote: > Hello, > > I am currently analyzing data from an exon array. After pre- processing with RMA, with which I obtain a eSet with ensembl IDs, I would like to annotate the gene with Entrez ID. I am using getBM function with as input the ensembl gene ID and as output the entrez gene ID. Here is a part of the code I am using : > mart<- useMart("ensembl", dataset = "hsapiens_gene_ensembl") > gene2genomeEx<- getBM(values = ex, filters = "ensembl_gene_id", mart = mart, attributes = c("ensembl_gene_id", "entrezgene","hgnc_symbol", "external_gene_id", "external_gene_db", "description", "chromosome_name", "strand")) > However for several genes (and a lot of histone genes), I obtain several entrez IDs for the same ensembl ID for example for : > ex<- c("ENSG00000215417", "ENSG00000224078", "ENSG00000198366", "ENSG00000196176", "ENSG00000166012", "ENSG00000158406", "ENSG00000196787"), I obtain : > ensembl_gene_id entrezgene hgnc_symbol external_gene_id external_gene_db > 1 ENSG00000158406 8294 HIST1H4H HIST1H4H HGNC Symbol > 2 ENSG00000158406 8359 HIST1H4H HIST1H4H HGNC Symbol > 3 ENSG00000158406 8360 HIST1H4H HIST1H4H HGNC Symbol > 4 ENSG00000158406 8361 HIST1H4H HIST1H4H HGNC Symbol > 5 ENSG00000158406 8362 HIST1H4H HIST1H4H HGNC Symbol > 6 ENSG00000158406 8363 HIST1H4H HIST1H4H HGNC Symbol > 7 ENSG00000158406 8364 HIST1H4H HIST1H4H HGNC Symbol > 8 ENSG00000158406 8365 HIST1H4H HIST1H4H HGNC Symbol > 9 ENSG00000158406 8366 HIST1H4H HIST1H4H HGNC Symbol > 10 ENSG00000158406 8367 HIST1H4H HIST1H4H HGNC Symbol > 11 ENSG00000158406 8368 HIST1H4H HIST1H4H HGNC Symbol > 12 ENSG00000158406 8370 HIST1H4H HIST1H4H HGNC Symbol > 13 ENSG00000158406 121504 HIST1H4H HIST1H4H HGNC Symbol > 14 ENSG00000158406 554313 HIST1H4H HIST1H4H HGNC Symbol > 15 ENSG00000166012 79101 TAF1D TAF1D HGNC Symbol > 16 ENSG00000166012 654320 TAF1D TAF1D HGNC Symbol > 17 ENSG00000166012 677792 TAF1D TAF1D HGNC Symbol > 18 ENSG00000166012 677805 TAF1D TAF1D HGNC Symbol > 19 ENSG00000166012 677822 TAF1D TAF1D HGNC Symbol > 20 ENSG00000166012 692063 TAF1D TAF1D HGNC Symbol > 21 ENSG00000166012 692072 TAF1D TAF1D HGNC Symbol > 22 ENSG00000166012 100302240 TAF1D TAF1D HGNC Symbol > 23 ENSG00000196176 8294 HIST1H4A HIST1H4A HGNC Symbol > 24 ENSG00000196176 8359 HIST1H4A HIST1H4A HGNC Symbol > 25 ENSG00000196176 8360 HIST1H4A HIST1H4A HGNC Symbol > 26 ENSG00000196176 8361 HIST1H4A HIST1H4A HGNC Symbol > 27 ENSG00000196176 8362 HIST1H4A HIST1H4A HGNC Symbol > 28 ENSG00000196176 8363 HIST1H4A HIST1H4A HGNC Symbol > 29 ENSG00000196176 8364 HIST1H4A HIST1H4A HGNC Symbol > 30 ENSG00000196176 8365 HIST1H4A HIST1H4A HGNC Symbol > 31 ENSG00000196176 8366 HIST1H4A HIST1H4A HGNC Symbol > 32 ENSG00000196176 8367 HIST1H4A HIST1H4A HGNC Symbol > 33 ENSG00000196176 8368 HIST1H4A HIST1H4A HGNC Symbol > 34 ENSG00000196176 8370 HIST1H4A HIST1H4A HGNC Symbol > 35 ENSG00000196176 121504 HIST1H4A HIST1H4A HGNC Symbol > 36 ENSG00000196176 554313 HIST1H4A HIST1H4A HGNC Symbol > 37 ENSG00000196787 8329 HIST1H2AG HIST1H2AG HGNC Symbol > 38 ENSG00000196787 8330 HIST1H2AG HIST1H2AG HGNC Symbol > 39 ENSG00000196787 8332 HIST1H2AG HIST1H2AG HGNC Symbol > 40 ENSG00000196787 8336 HIST1H2AG HIST1H2AG HGNC Symbol > 41 ENSG00000196787 8969 HIST1H2AG HIST1H2AG HGNC Symbol > 42 ENSG00000196787 85235 HIST1H2AG HIST1H2AG HGNC Symbol > 43 ENSG00000198366 8350 HIST1H3A HIST1H3A HGNC Symbol > 44 ENSG00000198366 8351 HIST1H3A HIST1H3A HGNC Symbol > 45 ENSG00000198366 8352 HIST1H3A HIST1H3A HGNC Symbol > 46 ENSG00000198366 8353 HIST1H3A HIST1H3A HGNC Symbol > 47 ENSG00000198366 8354 HIST1H3A HIST1H3A HGNC Symbol > 48 ENSG00000198366 8355 HIST1H3A HIST1H3A HGNC Symbol > 49 ENSG00000198366 8356 HIST1H3A HIST1H3A HGNC Symbol > 50 ENSG00000198366 8357 HIST1H3A HIST1H3A HGNC Symbol > 51 ENSG00000198366 8358 HIST1H3A HIST1H3A HGNC Symbol > 52 ENSG00000198366 8968 HIST1H3A HIST1H3A HGNC Symbol > 53 ENSG00000215417 406952 MIR17HG MIR17HG HGNC Symbol > 54 ENSG00000215417 406953 MIR17HG MIR17HG HGNC Symbol > 55 ENSG00000215417 406979 MIR17HG MIR17HG HGNC Symbol > 56 ENSG00000215417 406980 MIR17HG MIR17HG HGNC Symbol > 57 ENSG00000215417 406982 MIR17HG MIR17HG HGNC Symbol > 58 ENSG00000215417 407048 MIR17HG MIR17HG HGNC Symbol > 59 ENSG00000215417 407975 MIR17HG MIR17HG HGNC Symbol > 60 ENSG00000224078 91380 SNHG14 SNHG14 HGNC Symbol > 61 ENSG00000224078 100033444 SNHG14 SNHG14 HGNC Symbol > 62 ENSG00000224078 100033450 SNHG14 SNHG14 HGNC Symbol > 63 ENSG00000224078 100033802 SNHG14 SNHG14 HGNC Symbol > 64 ENSG00000224078 100033820 SNHG14 SNHG14 HGNC Symbol > 65 ENSG00000224078 100506948 SNHG14 SNHG14 HGNC Symbol > The description, chromosome_name and strand are the same for each ensembl gene ID. > I checked manually for the entrez ID which corresponds to the ensembl ID in ensembl.org, and I found only one entrezID for each gene. Does anyone knows where this problem come from? Is it linked to the nature of my request? I'm not sure where you are looking, but as an example, for ENSG00000215417, I see 7 EntrezGene genes on the Ensembl site, just like you have here: http://www.ensembl.org/Homo_sapiens/Gene/Matches?g=ENSG00000215417;r=1 3:92000074-92006833 In addition: > mget(get(ex[1], revmap(org.Hs.egENSEMBL)), org.Hs.egSYMBOL) $`407975` [1] "MIR17HG" $`406952` [1] "MIR17" $`406953` [1] "MIR18A" $`406979` [1] "MIR19A" $`406980` [1] "MIR19B1" $`406982` [1] "MIR20A" $`407048` [1] "MIR92A1" So I don't see anything unexpected here. Best, Jim > > Thanks in advance for your help, > > Yours sincerely, > > Laure Cougnaud > > > -- output of sessionInfo(): > > R version 2.15.1 (2012-06-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 > [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_2.12.0 affy_1.34.0 Biobase_2.16.0 BiocGenerics_0.2.0 rj_1.1.0-4 > > loaded via a namespace (and not attached): > [1] affyio_1.24.0 BiocInstaller_1.4.7 preprocessCore_1.18.0 RCurl_1.91-1 rj.gd_1.1.0-1 tools_2.15.1 > [7] XML_3.9-4 zlibbioc_1.2.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Hi James, Thanks for your response. Indeed this doesn't seem to be an issue of the getBM function, but more about the mapping between the ensembl id and the entrez ids. In my case, I have data from an exon array, so after using RMA with this cdf file 'huex10stv2hsensg', I have one value per ENSG, the summarized value of all probes targeting this region. I understand that several entrez ids seem to map within the location of ENSG00000215417 (Chromosome 13: 92,000,074-92,006,833), but in my case I would be interesting only of the gene ID corresponding to MIR17HG (407975), because it is the only ID that mapped totally to the gene location of ENSG00000215417. Also, I was a bit confused by the fact that getBM return several gene IDs, but only one symbol (which seems to be the right one for the ENSG), whereas the entrez IDs correspond to different gene symbols, as you pointed it out. Best, Laure ----- Original Message ----- From: "James W. MacDonald" <jmacdon@uw.edu> To: "Laure Cougnaud [guest]" <guest at="" bioconductor.org=""> Cc: bioconductor at r-project.org, "laure cougnaud" <laure.cougnaud at="" openanalytics.eu=""> Sent: Friday, May 3, 2013 3:43:19 PM Subject: Re: [BioC] annotation - biomaRt - getBM - multiple entrez ID for one ensembl ID Hi Laure, On 5/3/2013 2:43 AM, Laure Cougnaud [guest] wrote: > Hello, > > I am currently analyzing data from an exon array. After pre- processing with RMA, with which I obtain a eSet with ensembl IDs, I would like to annotate the gene with Entrez ID. I am using getBM function with as input the ensembl gene ID and as output the entrez gene ID. Here is a part of the code I am using : > mart<- useMart("ensembl", dataset = "hsapiens_gene_ensembl") > gene2genomeEx<- getBM(values = ex, filters = "ensembl_gene_id", mart = mart, attributes = c("ensembl_gene_id", "entrezgene","hgnc_symbol", "external_gene_id", "external_gene_db", "description", "chromosome_name", "strand")) > However for several genes (and a lot of histone genes), I obtain several entrez IDs for the same ensembl ID for example for : > ex<- c("ENSG00000215417", "ENSG00000224078", "ENSG00000198366", "ENSG00000196176", "ENSG00000166012", "ENSG00000158406", "ENSG00000196787"), I obtain : > ensembl_gene_id entrezgene hgnc_symbol external_gene_id external_gene_db > 1 ENSG00000158406 8294 HIST1H4H HIST1H4H HGNC Symbol > 2 ENSG00000158406 8359 HIST1H4H HIST1H4H HGNC Symbol > 3 ENSG00000158406 8360 HIST1H4H HIST1H4H HGNC Symbol > 4 ENSG00000158406 8361 HIST1H4H HIST1H4H HGNC Symbol > 5 ENSG00000158406 8362 HIST1H4H HIST1H4H HGNC Symbol > 6 ENSG00000158406 8363 HIST1H4H HIST1H4H HGNC Symbol > 7 ENSG00000158406 8364 HIST1H4H HIST1H4H HGNC Symbol > 8 ENSG00000158406 8365 HIST1H4H HIST1H4H HGNC Symbol > 9 ENSG00000158406 8366 HIST1H4H HIST1H4H HGNC Symbol > 10 ENSG00000158406 8367 HIST1H4H HIST1H4H HGNC Symbol > 11 ENSG00000158406 8368 HIST1H4H HIST1H4H HGNC Symbol > 12 ENSG00000158406 8370 HIST1H4H HIST1H4H HGNC Symbol > 13 ENSG00000158406 121504 HIST1H4H HIST1H4H HGNC Symbol > 14 ENSG00000158406 554313 HIST1H4H HIST1H4H HGNC Symbol > 15 ENSG00000166012 79101 TAF1D TAF1D HGNC Symbol > 16 ENSG00000166012 654320 TAF1D TAF1D HGNC Symbol > 17 ENSG00000166012 677792 TAF1D TAF1D HGNC Symbol > 18 ENSG00000166012 677805 TAF1D TAF1D HGNC Symbol > 19 ENSG00000166012 677822 TAF1D TAF1D HGNC Symbol > 20 ENSG00000166012 692063 TAF1D TAF1D HGNC Symbol > 21 ENSG00000166012 692072 TAF1D TAF1D HGNC Symbol > 22 ENSG00000166012 100302240 TAF1D TAF1D HGNC Symbol > 23 ENSG00000196176 8294 HIST1H4A HIST1H4A HGNC Symbol > 24 ENSG00000196176 8359 HIST1H4A HIST1H4A HGNC Symbol > 25 ENSG00000196176 8360 HIST1H4A HIST1H4A HGNC Symbol > 26 ENSG00000196176 8361 HIST1H4A HIST1H4A HGNC Symbol > 27 ENSG00000196176 8362 HIST1H4A HIST1H4A HGNC Symbol > 28 ENSG00000196176 8363 HIST1H4A HIST1H4A HGNC Symbol > 29 ENSG00000196176 8364 HIST1H4A HIST1H4A HGNC Symbol > 30 ENSG00000196176 8365 HIST1H4A HIST1H4A HGNC Symbol > 31 ENSG00000196176 8366 HIST1H4A HIST1H4A HGNC Symbol > 32 ENSG00000196176 8367 HIST1H4A HIST1H4A HGNC Symbol > 33 ENSG00000196176 8368 HIST1H4A HIST1H4A HGNC Symbol > 34 ENSG00000196176 8370 HIST1H4A HIST1H4A HGNC Symbol > 35 ENSG00000196176 121504 HIST1H4A HIST1H4A HGNC Symbol > 36 ENSG00000196176 554313 HIST1H4A HIST1H4A HGNC Symbol > 37 ENSG00000196787 8329 HIST1H2AG HIST1H2AG HGNC Symbol > 38 ENSG00000196787 8330 HIST1H2AG HIST1H2AG HGNC Symbol > 39 ENSG00000196787 8332 HIST1H2AG HIST1H2AG HGNC Symbol > 40 ENSG00000196787 8336 HIST1H2AG HIST1H2AG HGNC Symbol > 41 ENSG00000196787 8969 HIST1H2AG HIST1H2AG HGNC Symbol > 42 ENSG00000196787 85235 HIST1H2AG HIST1H2AG HGNC Symbol > 43 ENSG00000198366 8350 HIST1H3A HIST1H3A HGNC Symbol > 44 ENSG00000198366 8351 HIST1H3A HIST1H3A HGNC Symbol > 45 ENSG00000198366 8352 HIST1H3A HIST1H3A HGNC Symbol > 46 ENSG00000198366 8353 HIST1H3A HIST1H3A HGNC Symbol > 47 ENSG00000198366 8354 HIST1H3A HIST1H3A HGNC Symbol > 48 ENSG00000198366 8355 HIST1H3A HIST1H3A HGNC Symbol > 49 ENSG00000198366 8356 HIST1H3A HIST1H3A HGNC Symbol > 50 ENSG00000198366 8357 HIST1H3A HIST1H3A HGNC Symbol > 51 ENSG00000198366 8358 HIST1H3A HIST1H3A HGNC Symbol > 52 ENSG00000198366 8968 HIST1H3A HIST1H3A HGNC Symbol > 53 ENSG00000215417 406952 MIR17HG MIR17HG HGNC Symbol > 54 ENSG00000215417 406953 MIR17HG MIR17HG HGNC Symbol > 55 ENSG00000215417 406979 MIR17HG MIR17HG HGNC Symbol > 56 ENSG00000215417 406980 MIR17HG MIR17HG HGNC Symbol > 57 ENSG00000215417 406982 MIR17HG MIR17HG HGNC Symbol > 58 ENSG00000215417 407048 MIR17HG MIR17HG HGNC Symbol > 59 ENSG00000215417 407975 MIR17HG MIR17HG HGNC Symbol > 60 ENSG00000224078 91380 SNHG14 SNHG14 HGNC Symbol > 61 ENSG00000224078 100033444 SNHG14 SNHG14 HGNC Symbol > 62 ENSG00000224078 100033450 SNHG14 SNHG14 HGNC Symbol > 63 ENSG00000224078 100033802 SNHG14 SNHG14 HGNC Symbol > 64 ENSG00000224078 100033820 SNHG14 SNHG14 HGNC Symbol > 65 ENSG00000224078 100506948 SNHG14 SNHG14 HGNC Symbol > The description, chromosome_name and strand are the same for each ensembl gene ID. > I checked manually for the entrez ID which corresponds to the ensembl ID in ensembl.org, and I found only one entrezID for each gene. Does anyone knows where this problem come from? Is it linked to the nature of my request? I'm not sure where you are looking, but as an example, for ENSG00000215417, I see 7 EntrezGene genes on the Ensembl site, just like you have here: http://www.ensembl.org/Homo_sapiens/Gene/Matches?g=ENSG00000215417;r=1 3:92000074-92006833 In addition: > mget(get(ex[1], revmap(org.Hs.egENSEMBL)), org.Hs.egSYMBOL) $`407975` [1] "MIR17HG" $`406952` [1] "MIR17" $`406953` [1] "MIR18A" $`406979` [1] "MIR19A" $`406980` [1] "MIR19B1" $`406982` [1] "MIR20A" $`407048` [1] "MIR92A1" So I don't see anything unexpected here. Best, Jim > > Thanks in advance for your help, > > Yours sincerely, > > Laure Cougnaud > > > -- output of sessionInfo(): > > R version 2.15.1 (2012-06-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 > [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_2.12.0 affy_1.34.0 Biobase_2.16.0 BiocGenerics_0.2.0 rj_1.1.0-4 > > loaded via a namespace (and not attached): > [1] affyio_1.24.0 BiocInstaller_1.4.7 preprocessCore_1.18.0 RCurl_1.91-1 rj.gd_1.1.0-1 tools_2.15.1 > [7] XML_3.9-4 zlibbioc_1.2.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Hi Laure, You might want to ask the Ensembl people directly why they choose these mappings and send your email to helpdesk@ensembl.org Cheers, Steffen On Fri, May 3, 2013 at 7:18 AM, Laure Cougnaud < laure.cougnaud@openanalytics.eu> wrote: > Hi James, > > Thanks for your response. Indeed this doesn't seem to be an issue of the > getBM function, but more about the mapping between the ensembl id and the > entrez ids. > > In my case, I have data from an exon array, so after using RMA with this > cdf file 'huex10stv2hsensg', I have one value per ENSG, the summarized > value of all probes targeting this region. > > I understand that several entrez ids seem to map within the location of > ENSG00000215417 (Chromosome 13: 92,000,074-92,006,833), but in my case I > would be interesting only of the gene ID corresponding to MIR17HG (407975), > because it is the only ID that mapped totally to the gene location of > ENSG00000215417. > > Also, I was a bit confused by the fact that getBM return several gene IDs, > but only one symbol (which seems to be the right one for the ENSG), whereas > the entrez IDs correspond to different gene symbols, as you pointed it out. > > Best, > > Laure > > > ----- Original Message ----- > From: "James W. MacDonald" <jmacdon@uw.edu> > To: "Laure Cougnaud [guest]" <guest@bioconductor.org> > Cc: bioconductor@r-project.org, "laure cougnaud" < > laure.cougnaud@openanalytics.eu> > Sent: Friday, May 3, 2013 3:43:19 PM > Subject: Re: [BioC] annotation - biomaRt - getBM - multiple entrez ID for > one ensembl ID > > Hi Laure, > > On 5/3/2013 2:43 AM, Laure Cougnaud [guest] wrote: > > Hello, > > > > I am currently analyzing data from an exon array. After pre- processing > with RMA, with which I obtain a eSet with ensembl IDs, I would like to > annotate the gene with Entrez ID. I am using getBM function with as input > the ensembl gene ID and as output the entrez gene ID. Here is a part of > the code I am using : > > mart<- useMart("ensembl", dataset = "hsapiens_gene_ensembl") > > gene2genomeEx<- getBM(values = ex, filters = "ensembl_gene_id", mart = > mart, attributes = c("ensembl_gene_id", "entrezgene","hgnc_symbol", > "external_gene_id", "external_gene_db", "description", "chromosome_name", > "strand")) > > However for several genes (and a lot of histone genes), I obtain several > entrez IDs for the same ensembl ID for example for : > > ex<- c("ENSG00000215417", "ENSG00000224078", "ENSG00000198366", > "ENSG00000196176", "ENSG00000166012", "ENSG00000158406", > "ENSG00000196787"), I obtain : > > ensembl_gene_id entrezgene hgnc_symbol external_gene_id > external_gene_db > > 1 ENSG00000158406 8294 HIST1H4H HIST1H4H HGNC > Symbol > > 2 ENSG00000158406 8359 HIST1H4H HIST1H4H HGNC > Symbol > > 3 ENSG00000158406 8360 HIST1H4H HIST1H4H HGNC > Symbol > > 4 ENSG00000158406 8361 HIST1H4H HIST1H4H HGNC > Symbol > > 5 ENSG00000158406 8362 HIST1H4H HIST1H4H HGNC > Symbol > > 6 ENSG00000158406 8363 HIST1H4H HIST1H4H HGNC > Symbol > > 7 ENSG00000158406 8364 HIST1H4H HIST1H4H HGNC > Symbol > > 8 ENSG00000158406 8365 HIST1H4H HIST1H4H HGNC > Symbol > > 9 ENSG00000158406 8366 HIST1H4H HIST1H4H HGNC > Symbol > > 10 ENSG00000158406 8367 HIST1H4H HIST1H4H HGNC > Symbol > > 11 ENSG00000158406 8368 HIST1H4H HIST1H4H HGNC > Symbol > > 12 ENSG00000158406 8370 HIST1H4H HIST1H4H HGNC > Symbol > > 13 ENSG00000158406 121504 HIST1H4H HIST1H4H HGNC > Symbol > > 14 ENSG00000158406 554313 HIST1H4H HIST1H4H HGNC > Symbol > > 15 ENSG00000166012 79101 TAF1D TAF1D HGNC > Symbol > > 16 ENSG00000166012 654320 TAF1D TAF1D HGNC > Symbol > > 17 ENSG00000166012 677792 TAF1D TAF1D HGNC > Symbol > > 18 ENSG00000166012 677805 TAF1D TAF1D HGNC > Symbol > > 19 ENSG00000166012 677822 TAF1D TAF1D HGNC > Symbol > > 20 ENSG00000166012 692063 TAF1D TAF1D HGNC > Symbol > > 21 ENSG00000166012 692072 TAF1D TAF1D HGNC > Symbol > > 22 ENSG00000166012 100302240 TAF1D TAF1D HGNC > Symbol > > 23 ENSG00000196176 8294 HIST1H4A HIST1H4A HGNC > Symbol > > 24 ENSG00000196176 8359 HIST1H4A HIST1H4A HGNC > Symbol > > 25 ENSG00000196176 8360 HIST1H4A HIST1H4A HGNC > Symbol > > 26 ENSG00000196176 8361 HIST1H4A HIST1H4A HGNC > Symbol > > 27 ENSG00000196176 8362 HIST1H4A HIST1H4A HGNC > Symbol > > 28 ENSG00000196176 8363 HIST1H4A HIST1H4A HGNC > Symbol > > 29 ENSG00000196176 8364 HIST1H4A HIST1H4A HGNC > Symbol > > 30 ENSG00000196176 8365 HIST1H4A HIST1H4A HGNC > Symbol > > 31 ENSG00000196176 8366 HIST1H4A HIST1H4A HGNC > Symbol > > 32 ENSG00000196176 8367 HIST1H4A HIST1H4A HGNC > Symbol > > 33 ENSG00000196176 8368 HIST1H4A HIST1H4A HGNC > Symbol > > 34 ENSG00000196176 8370 HIST1H4A HIST1H4A HGNC > Symbol > > 35 ENSG00000196176 121504 HIST1H4A HIST1H4A HGNC > Symbol > > 36 ENSG00000196176 554313 HIST1H4A HIST1H4A HGNC > Symbol > > 37 ENSG00000196787 8329 HIST1H2AG HIST1H2AG HGNC > Symbol > > 38 ENSG00000196787 8330 HIST1H2AG HIST1H2AG HGNC > Symbol > > 39 ENSG00000196787 8332 HIST1H2AG HIST1H2AG HGNC > Symbol > > 40 ENSG00000196787 8336 HIST1H2AG HIST1H2AG HGNC > Symbol > > 41 ENSG00000196787 8969 HIST1H2AG HIST1H2AG HGNC > Symbol > > 42 ENSG00000196787 85235 HIST1H2AG HIST1H2AG HGNC > Symbol > > 43 ENSG00000198366 8350 HIST1H3A HIST1H3A HGNC > Symbol > > 44 ENSG00000198366 8351 HIST1H3A HIST1H3A HGNC > Symbol > > 45 ENSG00000198366 8352 HIST1H3A HIST1H3A HGNC > Symbol > > 46 ENSG00000198366 8353 HIST1H3A HIST1H3A HGNC > Symbol > > 47 ENSG00000198366 8354 HIST1H3A HIST1H3A HGNC > Symbol > > 48 ENSG00000198366 8355 HIST1H3A HIST1H3A HGNC > Symbol > > 49 ENSG00000198366 8356 HIST1H3A HIST1H3A HGNC > Symbol > > 50 ENSG00000198366 8357 HIST1H3A HIST1H3A HGNC > Symbol > > 51 ENSG00000198366 8358 HIST1H3A HIST1H3A HGNC > Symbol > > 52 ENSG00000198366 8968 HIST1H3A HIST1H3A HGNC > Symbol > > 53 ENSG00000215417 406952 MIR17HG MIR17HG HGNC > Symbol > > 54 ENSG00000215417 406953 MIR17HG MIR17HG HGNC > Symbol > > 55 ENSG00000215417 406979 MIR17HG MIR17HG HGNC > Symbol > > 56 ENSG00000215417 406980 MIR17HG MIR17HG HGNC > Symbol > > 57 ENSG00000215417 406982 MIR17HG MIR17HG HGNC > Symbol > > 58 ENSG00000215417 407048 MIR17HG MIR17HG HGNC > Symbol > > 59 ENSG00000215417 407975 MIR17HG MIR17HG HGNC > Symbol > > 60 ENSG00000224078 91380 SNHG14 SNHG14 HGNC > Symbol > > 61 ENSG00000224078 100033444 SNHG14 SNHG14 HGNC > Symbol > > 62 ENSG00000224078 100033450 SNHG14 SNHG14 HGNC > Symbol > > 63 ENSG00000224078 100033802 SNHG14 SNHG14 HGNC > Symbol > > 64 ENSG00000224078 100033820 SNHG14 SNHG14 HGNC > Symbol > > 65 ENSG00000224078 100506948 SNHG14 SNHG14 HGNC > Symbol > > The description, chromosome_name and strand are the same for each > ensembl gene ID. > > I checked manually for the entrez ID which corresponds to the ensembl ID > in ensembl.org, and I found only one entrezID for each gene. Does anyone > knows where this problem come from? Is it linked to the nature of my > request? > > I'm not sure where you are looking, but as an example, for > ENSG00000215417, I see 7 EntrezGene genes on the Ensembl site, just like > you have here: > > > http://www.ensembl.org/Homo_sapiens/Gene/Matches?g=ENSG00000215417;r =13:92000074-92006833 > > In addition: > > > mget(get(ex[1], revmap(org.Hs.egENSEMBL)), org.Hs.egSYMBOL) > $`407975` > [1] "MIR17HG" > > $`406952` > [1] "MIR17" > > $`406953` > [1] "MIR18A" > > $`406979` > [1] "MIR19A" > > $`406980` > [1] "MIR19B1" > > $`406982` > [1] "MIR20A" > > $`407048` > [1] "MIR92A1" > > So I don't see anything unexpected here. > > Best, > > Jim > > > > Thanks in advance for your help, > > > > Yours sincerely, > > > > Laure Cougnaud > > > > > > -- output of sessionInfo(): > > > > R version 2.15.1 (2012-06-22) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > LC_MONETARY=en_US.UTF-8 > > [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C > LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] biomaRt_2.12.0 affy_1.34.0 Biobase_2.16.0 > BiocGenerics_0.2.0 rj_1.1.0-4 > > > > loaded via a namespace (and not attached): > > [1] affyio_1.24.0 BiocInstaller_1.4.7 preprocessCore_1.18.0 > RCurl_1.91-1 rj.gd_1.1.0-1 tools_2.15.1 > > [7] XML_3.9-4 zlibbioc_1.2.0 > > > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6