Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.2 years ago
Hello,
I am currently analyzing data from an exon array. After pre-processing
with RMA, with which I obtain a eSet with ensembl IDs, I would like to
annotate the gene with Entrez ID. I am using getBM function with as
input the ensembl gene ID and as output the entrez gene ID. Here is a
part of the code I am using :
mart <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
gene2genomeEx <- getBM(values = ex, filters = "ensembl_gene_id", mart
= mart, attributes = c("ensembl_gene_id", "entrezgene","hgnc_symbol",
"external_gene_id", "external_gene_db", "description",
"chromosome_name", "strand"))
However for several genes (and a lot of histone genes), I obtain
several entrez IDs for the same ensembl ID for example for :
ex <- c("ENSG00000215417", "ENSG00000224078", "ENSG00000198366",
"ENSG00000196176", "ENSG00000166012", "ENSG00000158406",
"ENSG00000196787"), I obtain :
ensembl_gene_id entrezgene hgnc_symbol external_gene_id
external_gene_db
1 ENSG00000158406 8294 HIST1H4H HIST1H4H HGNC
Symbol
2 ENSG00000158406 8359 HIST1H4H HIST1H4H HGNC
Symbol
3 ENSG00000158406 8360 HIST1H4H HIST1H4H HGNC
Symbol
4 ENSG00000158406 8361 HIST1H4H HIST1H4H HGNC
Symbol
5 ENSG00000158406 8362 HIST1H4H HIST1H4H HGNC
Symbol
6 ENSG00000158406 8363 HIST1H4H HIST1H4H HGNC
Symbol
7 ENSG00000158406 8364 HIST1H4H HIST1H4H HGNC
Symbol
8 ENSG00000158406 8365 HIST1H4H HIST1H4H HGNC
Symbol
9 ENSG00000158406 8366 HIST1H4H HIST1H4H HGNC
Symbol
10 ENSG00000158406 8367 HIST1H4H HIST1H4H HGNC
Symbol
11 ENSG00000158406 8368 HIST1H4H HIST1H4H HGNC
Symbol
12 ENSG00000158406 8370 HIST1H4H HIST1H4H HGNC
Symbol
13 ENSG00000158406 121504 HIST1H4H HIST1H4H HGNC
Symbol
14 ENSG00000158406 554313 HIST1H4H HIST1H4H HGNC
Symbol
15 ENSG00000166012 79101 TAF1D TAF1D HGNC
Symbol
16 ENSG00000166012 654320 TAF1D TAF1D HGNC
Symbol
17 ENSG00000166012 677792 TAF1D TAF1D HGNC
Symbol
18 ENSG00000166012 677805 TAF1D TAF1D HGNC
Symbol
19 ENSG00000166012 677822 TAF1D TAF1D HGNC
Symbol
20 ENSG00000166012 692063 TAF1D TAF1D HGNC
Symbol
21 ENSG00000166012 692072 TAF1D TAF1D HGNC
Symbol
22 ENSG00000166012 100302240 TAF1D TAF1D HGNC
Symbol
23 ENSG00000196176 8294 HIST1H4A HIST1H4A HGNC
Symbol
24 ENSG00000196176 8359 HIST1H4A HIST1H4A HGNC
Symbol
25 ENSG00000196176 8360 HIST1H4A HIST1H4A HGNC
Symbol
26 ENSG00000196176 8361 HIST1H4A HIST1H4A HGNC
Symbol
27 ENSG00000196176 8362 HIST1H4A HIST1H4A HGNC
Symbol
28 ENSG00000196176 8363 HIST1H4A HIST1H4A HGNC
Symbol
29 ENSG00000196176 8364 HIST1H4A HIST1H4A HGNC
Symbol
30 ENSG00000196176 8365 HIST1H4A HIST1H4A HGNC
Symbol
31 ENSG00000196176 8366 HIST1H4A HIST1H4A HGNC
Symbol
32 ENSG00000196176 8367 HIST1H4A HIST1H4A HGNC
Symbol
33 ENSG00000196176 8368 HIST1H4A HIST1H4A HGNC
Symbol
34 ENSG00000196176 8370 HIST1H4A HIST1H4A HGNC
Symbol
35 ENSG00000196176 121504 HIST1H4A HIST1H4A HGNC
Symbol
36 ENSG00000196176 554313 HIST1H4A HIST1H4A HGNC
Symbol
37 ENSG00000196787 8329 HIST1H2AG HIST1H2AG HGNC
Symbol
38 ENSG00000196787 8330 HIST1H2AG HIST1H2AG HGNC
Symbol
39 ENSG00000196787 8332 HIST1H2AG HIST1H2AG HGNC
Symbol
40 ENSG00000196787 8336 HIST1H2AG HIST1H2AG HGNC
Symbol
41 ENSG00000196787 8969 HIST1H2AG HIST1H2AG HGNC
Symbol
42 ENSG00000196787 85235 HIST1H2AG HIST1H2AG HGNC
Symbol
43 ENSG00000198366 8350 HIST1H3A HIST1H3A HGNC
Symbol
44 ENSG00000198366 8351 HIST1H3A HIST1H3A HGNC
Symbol
45 ENSG00000198366 8352 HIST1H3A HIST1H3A HGNC
Symbol
46 ENSG00000198366 8353 HIST1H3A HIST1H3A HGNC
Symbol
47 ENSG00000198366 8354 HIST1H3A HIST1H3A HGNC
Symbol
48 ENSG00000198366 8355 HIST1H3A HIST1H3A HGNC
Symbol
49 ENSG00000198366 8356 HIST1H3A HIST1H3A HGNC
Symbol
50 ENSG00000198366 8357 HIST1H3A HIST1H3A HGNC
Symbol
51 ENSG00000198366 8358 HIST1H3A HIST1H3A HGNC
Symbol
52 ENSG00000198366 8968 HIST1H3A HIST1H3A HGNC
Symbol
53 ENSG00000215417 406952 MIR17HG MIR17HG HGNC
Symbol
54 ENSG00000215417 406953 MIR17HG MIR17HG HGNC
Symbol
55 ENSG00000215417 406979 MIR17HG MIR17HG HGNC
Symbol
56 ENSG00000215417 406980 MIR17HG MIR17HG HGNC
Symbol
57 ENSG00000215417 406982 MIR17HG MIR17HG HGNC
Symbol
58 ENSG00000215417 407048 MIR17HG MIR17HG HGNC
Symbol
59 ENSG00000215417 407975 MIR17HG MIR17HG HGNC
Symbol
60 ENSG00000224078 91380 SNHG14 SNHG14 HGNC
Symbol
61 ENSG00000224078 100033444 SNHG14 SNHG14 HGNC
Symbol
62 ENSG00000224078 100033450 SNHG14 SNHG14 HGNC
Symbol
63 ENSG00000224078 100033802 SNHG14 SNHG14 HGNC
Symbol
64 ENSG00000224078 100033820 SNHG14 SNHG14 HGNC
Symbol
65 ENSG00000224078 100506948 SNHG14 SNHG14 HGNC
Symbol
The description, chromosome_name and strand are the same for each
ensembl gene ID.
I checked manually for the entrez ID which corresponds to the ensembl
ID in ensembl.org, and I found only one entrezID for each gene. Does
anyone knows where this problem come from? Is it linked to the nature
of my request?
Thanks in advance for your help,
Yours sincerely,
Laure Cougnaud
-- output of sessionInfo():
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C
LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.12.0 affy_1.34.0 Biobase_2.16.0
BiocGenerics_0.2.0 rj_1.1.0-4
loaded via a namespace (and not attached):
[1] affyio_1.24.0 BiocInstaller_1.4.7 preprocessCore_1.18.0
RCurl_1.91-1 rj.gd_1.1.0-1 tools_2.15.1
[7] XML_3.9-4 zlibbioc_1.2.0
--
Sent via the guest posting facility at bioconductor.org.