Entering edit mode
Stefan McKinnon Høj-Edwards
▴
20
@stefan-mckinnon-hj-edwards-4846
Last seen 10.2 years ago
Hi Iain,
I also had problems with the org.Xx.eg.db packages and ended up making
a package, AnnotationFuncs, for it. It handles mapping from one
identifier to another (via the central identifier), using the data
packages. I have included an example below.
> library(AnnotationFuncs)
> library(org.Hs.eg.db)
> syms <- c('ACTB', 'TNF', 'TGFB1')
> translate(syms, from=org.Hs.egSYMBOL2EG, to=org.Hs.egENSEMBL)
$ACTB
[1] "ENSG00000075624"
$TNF
[1] "ENSG00000204490" "ENSG00000206439" "ENSG00000223952"
"ENSG00000228321"
[5] "ENSG00000228849" "ENSG00000230108" "ENSG00000232810"
$TGFB1
[1] "ENSG00000105329"
Kind regards
Stefan McKinnon H?j-Edwards Dept. of Molecular Biology
and Genetics
Ph.D. Fellow Aarhus University
Blichers All? 20, Postboks 50
DK-8830 Tjele
Tel.: +45 8715 7969 Tel.: +45 8715 6000
Email: Stefan.Hoj-Edwards at agrsci.dk Web: www.agrsci.dk
-----Oprindelig meddelelse-----
Message: 4
Date: Thu, 6 Oct 2011 12:50:41 +0100
From: Iain Gallagher <iaingallagher@btopenworld.com>
To: bioconductor <bioconductor at="" stat.math.ethz.ch="">
Subject: [BioC] mapping through org.Xx.eg.db packages
Message-ID:
<1317901841.98063.YahooMailClassic at
web86708.mail.ird.yahoo.com>
Content-Type: text/plain; charset="utf-8"
Dear List
I wonder is someone could shed some light on the following.
Given a set of gene symbols I would like to retrieve different
identifiers.
Using the org.Xx.eg.db packages I can go about this by mapping through
the EntrezIDs:
# mapping through eg ids as package is eg id centric
library(org.Hs.eg.db)
syms <- c('ACTB', 'TNF', 'TGFB1')
egID <- unlist(mget(syms, org.Hs.egSYMBOL2EG, ifnotfound=NA))
ensID <- unlist(mget(egID, org.Hs.egENSEMBL, ifnotfound=NA))
> ensID
60 71241 71242
71243
"ENSG00000075624" "ENSG00000204490" "ENSG00000206439"
"ENSG00000223952"
71244 71245 71246
71247
"ENSG00000228321" "ENSG00000228849" "ENSG00000230108"
"ENSG00000232810"
7040
"ENSG00000105329"
> egID
ACTB TNF TGFB1
"60" "7124" "7040"
Now here I assumed that the names of the ensID object were the
original EntrezIDs mapped from the symbols but because R does not
handle duplicate names they are not - with renumbering for those
EntrezIDs that have a plurality of matches (here 7124 becomes 71241,
71242 etc etc)
This has caused me some confusion since each of these names is an
actual Entrez ID - just not one I'm interested in.
The same can happen when mapping from any ID that ends in a numeric
part (eg Ensembl ids).
It is useful to return a mapping showing the original identifier, the
EntrezID mapped through and the required identifier so how could one
reliably do this when mapping through e.g. Entrez IDs as in the method
above (i.e. return the Entrez ID and Ensembl ID in one sweep)?
I have tried using the SQL approach:
dbCon <- org.Hs.eg_dbconn()
sqlQuery <- 'SELECT * FROM genes, gene_info, ensembl WHERE genes._id =
gene_info._id = ensembl._id;'
result <- dbGetQuery(dbCon, sqlQuery)
where one could filter the 'result' object with the symbols of
interest but this query takes a long time to run. I know little SQL so
that might be an issue!
Best
iain