org.Bt.eg.db / annotation problem

0

Entering edit mode

Iain Gallagher ▴ 930

@iain-gallagher-2532

Last seen 8.7 years ago

United Kingdom

Hello List Perhaps someone could help me with this. I am annotating some 200 genes with the org.Bt.eg.db package. The identifier I have for the genes is and Ensembl ID (e.g. ENSBTAG00000009012). I am attempting to return the ENTREZ id with the following code (where rownames(topGenes$table) is my vector of Ensembl IDs): egIds <- unlist(mget(rownames(topGenes$table), org.Bt.egENSEMBL2EG, ifnotfound=NA)) This returns a named vector but it contains Ensembl IDs that were not in my query. setdiff(names(egIds), rownames(topGenes$table)) [1] "ENSBTAG000000375581" "ENSBTAG000000375582" "ENSBTAG000000312311" [4] "ENSBTAG000000312312" "ENSBTAG000000306301" "ENSBTAG000000306302" [7] "ENSBTAG000000359951" "ENSBTAG000000359952" "ENSBTAG000000005461" [10] "ENSBTAG000000005462" "ENSBTAG000000005041" "ENSBTAG000000005042" [13] "ENSBTAG000000307771" "ENSBTAG000000307772" "ENSBTAG000000135691" [16] "ENSBTAG000000135692" Could someone explain why this is happening? The IDs above (i.e. those not in my query are returned with Entrez IDs). egIds[setdiff(names(egIds), rownames(topGenes$table))] ENSBTAG000000375581 "281212" etc etc Thanks iain > sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] org.Bt.eg.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5 [4] AnnotationDbi_1.14.1 Biobase_2.10.0 edgeR_2.2.5 loaded via a namespace (and not attached): [1] limma_3.6.6 tools_2.13.1 >

• 1.1k views

ADD COMMENT • link updated 12.7 years ago by jason0701 ▴ 190 • written 12.7 years ago by Iain Gallagher ▴ 930

0

Entering edit mode

jason0701 ▴ 190

@jason0701-3921

Last seen 4.4 years ago

Hi Iain, I think this is due to multiple matches for some keys. In your example, "ENSBTAG000000375581" and "ENSBTAG000000375582" are likely "ENSBTAG00000037558". Jason

ADD COMMENT • link 12.7 years ago jason0701 ▴ 190

0

Entering edit mode

Thanks Jason. Chasing this up it seems that neither "ENSBTAG000000375581" or "ENSBTAG000000375582" are in Ensembl. The data was passed to me by others so I think I'll do some judicious filtering based on the org package. Best iain --- On Tue, 2/8/11, Jason Lu <jasonlu68 at="" gmail.com=""> wrote: > From: Jason Lu <jasonlu68 at="" gmail.com=""> > Subject: Re: [BioC] org.Bt.eg.db / annotation problem > To: bioconductor at stat.math.ethz.ch > Date: Tuesday, 2 August, 2011, 15:03 > Hi Iain, > > I think this is due to multiple matches for some keys. > In your example, "ENSBTAG000000375581" and > "ENSBTAG000000375582" are > likely "ENSBTAG00000037558". > > Jason > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 12.7 years ago Iain Gallagher ▴ 930

Login before adding your answer.