Question about mget vs. select for annotation package
1
0
Entering edit mode
@christina-chaivorapol-5712
Last seen 10.3 years ago
Hi, I seem to be getting different results depending on if I use select() or mget() with the hgu133plus2.db package for a probe with a 1 probe to many gene mapping. Does anyone know why there is a discrepancy? > select(hgu133plus2.db, keys="213801_x_at", cols=c("ENTREZID", "SYMBOL"), keytype="PROBEID") PROBEID ENTREZID SYMBOL 1 213801_x_at 3921 RPSA 2 213801_x_at 388524 RPSAP58 3 213801_x_at 574040 SNORA6 4 213801_x_at 6044 SNORA62 5 213801_x_at 653162 RPSAP9 6 213801_x_at 730029 RPSAP19 Warning message: In .generateExtraRows(tab, keys, jointype) : 'select' resulted in 1:many mapping between keys and return rows > mget("213801_x_at", hgu133plus2ENTREZID) $`213801_x_at` [1] NA > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] hgu133plus2.db_2.9.0 org.Hs.eg.db_2.9.0 RSQLite_0.11.3 [4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0 [7] BiocGenerics_0.6.0 limma_3.16.2 loaded via a namespace (and not attached): [1] IRanges_1.18.0 stats4_3.0.0 tools_3.0.0 Thanks, Christina -- Christina Chaivorapol, Ph.D. Genentech, Inc. Bioinformatics & Computational Biology chrichai@gene.com [[alternative HTML version deleted]]
hgu133plus2 probe hgu133plus2 probe • 2.1k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 15 hours ago
Seattle, WA, United States
Hi Christina, In AnnotationDbi jargon, a probe that matches multiple genes is called a multiple probe. When using the classic Bimap API, multiple probles are mapped to NA by default. Unless you use toggleProbes() on the Bimap object to request the full mapping: > map <- toggleProbes(hgu133plus2ENTREZID, "all") > mget("213801_x_at", map) $`213801_x_at` [1] "3921" "388524" "574040" "6044" "653162" "730029" Personally I think that making multiple probes appear that they're not mapped to any gene is not doing any good. Hopefully at some point this can be reconsidered. Cheers, H. On 07/02/2013 02:53 PM, Christina Chaivorapol wrote: > Hi, > > I seem to be getting different results depending on if I use select() or > mget() with the hgu133plus2.db package for a probe with a 1 probe to many > gene mapping. Does anyone know why there is a discrepancy? > >> select(hgu133plus2.db, keys="213801_x_at", cols=c("ENTREZID", "SYMBOL"), > keytype="PROBEID") > PROBEID ENTREZID SYMBOL > 1 213801_x_at 3921 RPSA > 2 213801_x_at 388524 RPSAP58 > 3 213801_x_at 574040 SNORA6 > 4 213801_x_at 6044 SNORA62 > 5 213801_x_at 653162 RPSAP9 > 6 213801_x_at 730029 RPSAP19 > Warning message: > In .generateExtraRows(tab, keys, jointype) : > 'select' resulted in 1:many mapping between keys and return rows > >> mget("213801_x_at", hgu133plus2ENTREZID) > $`213801_x_at` > [1] NA > >> sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] hgu133plus2.db_2.9.0 org.Hs.eg.db_2.9.0 RSQLite_0.11.3 > [4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0 > [7] BiocGenerics_0.6.0 limma_3.16.2 > > loaded via a namespace (and not attached): > [1] IRanges_1.18.0 stats4_3.0.0 tools_3.0.0 > > Thanks, > Christina > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Hi Christina, The basic problem is that the bimap interface was created in order to emulate an even older set of environments. And for these older platforms, people mostly were initially very uninterested in keys (probe IDs) that mapped to multiple different things as those probes were usually IDs from microarrays. And a probe on a microarray that maps to multiple targets is probably just a bad probe... So software was written with that limitation in mind, and time marched on and now if we changed it, some of that old code might break. Later on, when people started to use these bimaps for things other than microarrays, we kept that multiple probe limitation for backwards compatibility, and then provided the toggleProbes() method that Herve mentioned so that people could get that data if they cared to. And when we wrote the newer select() interface, the world had moved on to where we were doing both microarray and a host of other things like high throughput sequencing, and annotation was mostly something that people did at the end of an analysis, and usually just to decorate a data.frame object. So when we wrote select() we were now free to always expose all the data for a probe or gene and just warn the user that they might be getting back more data than was expected (when that actually happened). So select() was really designed to be a more general annotation tool. At this time, we are hoping that most people will use select() which offers a simpler way to access this data. But we still provide the older bimap interface mostly for the sake of backwards compatibility. Marc On 07/02/2013 03:26 PM, Hervé Pagès wrote: > Hi Christina, > > In AnnotationDbi jargon, a probe that matches multiple genes is called > a multiple probe. When using the classic Bimap API, multiple probles are > mapped to NA by default. Unless you use toggleProbes() on the Bimap > object to request the full mapping: > > > map <- toggleProbes(hgu133plus2ENTREZID, "all") > > > mget("213801_x_at", map) > $`213801_x_at` > [1] "3921" "388524" "574040" "6044" "653162" "730029" > > Personally I think that making multiple probes appear that they're > not mapped to any gene is not doing any good. Hopefully at some point > this can be reconsidered. > > Cheers, > H. > > > On 07/02/2013 02:53 PM, Christina Chaivorapol wrote: >> Hi, >> >> I seem to be getting different results depending on if I use select() or >> mget() with the hgu133plus2.db package for a probe with a 1 probe to >> many >> gene mapping. Does anyone know why there is a discrepancy? >> >>> select(hgu133plus2.db, keys="213801_x_at", cols=c("ENTREZID", >>> "SYMBOL"), >> keytype="PROBEID") >> PROBEID ENTREZID SYMBOL >> 1 213801_x_at 3921 RPSA >> 2 213801_x_at 388524 RPSAP58 >> 3 213801_x_at 574040 SNORA6 >> 4 213801_x_at 6044 SNORA62 >> 5 213801_x_at 653162 RPSAP9 >> 6 213801_x_at 730029 RPSAP19 >> Warning message: >> In .generateExtraRows(tab, keys, jointype) : >> 'select' resulted in 1:many mapping between keys and return rows >> >>> mget("213801_x_at", hgu133plus2ENTREZID) >> $`213801_x_at` >> [1] NA >> >>> sessionInfo() >> R version 3.0.0 (2013-04-03) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] hgu133plus2.db_2.9.0 org.Hs.eg.db_2.9.0 RSQLite_0.11.3 >> [4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0 >> [7] BiocGenerics_0.6.0 limma_3.16.2 >> >> loaded via a namespace (and not attached): >> [1] IRanges_1.18.0 stats4_3.0.0 tools_3.0.0 >> >> Thanks, >> Christina >> >
ADD REPLY

Login before adding your answer.

Traffic: 406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6