Hi,
I seem to be getting different results depending on if I use select()
or
mget() with the hgu133plus2.db package for a probe with a 1 probe to
many
gene mapping. Does anyone know why there is a discrepancy?
> select(hgu133plus2.db, keys="213801_x_at", cols=c("ENTREZID",
"SYMBOL"),
keytype="PROBEID")
PROBEID ENTREZID SYMBOL
1 213801_x_at 3921 RPSA
2 213801_x_at 388524 RPSAP58
3 213801_x_at 574040 SNORA6
4 213801_x_at 6044 SNORA62
5 213801_x_at 653162 RPSAP9
6 213801_x_at 730029 RPSAP19
Warning message:
In .generateExtraRows(tab, keys, jointype) :
'select' resulted in 1:many mapping between keys and return rows
> mget("213801_x_at", hgu133plus2ENTREZID)
$`213801_x_at`
[1] NA
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] hgu133plus2.db_2.9.0 org.Hs.eg.db_2.9.0 RSQLite_0.11.3
[4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0
[7] BiocGenerics_0.6.0 limma_3.16.2
loaded via a namespace (and not attached):
[1] IRanges_1.18.0 stats4_3.0.0 tools_3.0.0
Thanks,
Christina
--
Christina Chaivorapol, Ph.D.
Genentech, Inc.
Bioinformatics & Computational Biology
chrichai@gene.com
[[alternative HTML version deleted]]
Hi Christina,
In AnnotationDbi jargon, a probe that matches multiple genes is called
a multiple probe. When using the classic Bimap API, multiple probles
are
mapped to NA by default. Unless you use toggleProbes() on the Bimap
object to request the full mapping:
> map <- toggleProbes(hgu133plus2ENTREZID, "all")
> mget("213801_x_at", map)
$`213801_x_at`
[1] "3921" "388524" "574040" "6044" "653162" "730029"
Personally I think that making multiple probes appear that they're
not mapped to any gene is not doing any good. Hopefully at some point
this can be reconsidered.
Cheers,
H.
On 07/02/2013 02:53 PM, Christina Chaivorapol wrote:
> Hi,
>
> I seem to be getting different results depending on if I use
select() or
> mget() with the hgu133plus2.db package for a probe with a 1 probe to
many
> gene mapping. Does anyone know why there is a discrepancy?
>
>> select(hgu133plus2.db, keys="213801_x_at", cols=c("ENTREZID",
"SYMBOL"),
> keytype="PROBEID")
> PROBEID ENTREZID SYMBOL
> 1 213801_x_at 3921 RPSA
> 2 213801_x_at 388524 RPSAP58
> 3 213801_x_at 574040 SNORA6
> 4 213801_x_at 6044 SNORA62
> 5 213801_x_at 653162 RPSAP9
> 6 213801_x_at 730029 RPSAP19
> Warning message:
> In .generateExtraRows(tab, keys, jointype) :
> 'select' resulted in 1:many mapping between keys and return rows
>
>> mget("213801_x_at", hgu133plus2ENTREZID)
> $`213801_x_at`
> [1] NA
>
>> sessionInfo()
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] hgu133plus2.db_2.9.0 org.Hs.eg.db_2.9.0 RSQLite_0.11.3
> [4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0
> [7] BiocGenerics_0.6.0 limma_3.16.2
>
> loaded via a namespace (and not attached):
> [1] IRanges_1.18.0 stats4_3.0.0 tools_3.0.0
>
> Thanks,
> Christina
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
Hi Christina,
The basic problem is that the bimap interface was created in order to
emulate an even older set of environments. And for these older
platforms, people mostly were initially very uninterested in keys
(probe
IDs) that mapped to multiple different things as those probes were
usually IDs from microarrays. And a probe on a microarray that maps
to
multiple targets is probably just a bad probe... So software was
written with that limitation in mind, and time marched on and now if
we
changed it, some of that old code might break. Later on, when people
started to use these bimaps for things other than microarrays, we kept
that multiple probe limitation for backwards compatibility, and then
provided the toggleProbes() method that Herve mentioned so that people
could get that data if they cared to.
And when we wrote the newer select() interface, the world had moved on
to where we were doing both microarray and a host of other things like
high throughput sequencing, and annotation was mostly something that
people did at the end of an analysis, and usually just to decorate a
data.frame object. So when we wrote select() we were now free to
always
expose all the data for a probe or gene and just warn the user that
they
might be getting back more data than was expected (when that actually
happened). So select() was really designed to be a more general
annotation tool. At this time, we are hoping that most people will
use
select() which offers a simpler way to access this data. But we still
provide the older bimap interface mostly for the sake of backwards
compatibility.
Marc
On 07/02/2013 03:26 PM, Hervé Pagès wrote:
> Hi Christina,
>
> In AnnotationDbi jargon, a probe that matches multiple genes is
called
> a multiple probe. When using the classic Bimap API, multiple probles
are
> mapped to NA by default. Unless you use toggleProbes() on the Bimap
> object to request the full mapping:
>
> > map <- toggleProbes(hgu133plus2ENTREZID, "all")
>
> > mget("213801_x_at", map)
> $`213801_x_at`
> [1] "3921" "388524" "574040" "6044" "653162" "730029"
>
> Personally I think that making multiple probes appear that they're
> not mapped to any gene is not doing any good. Hopefully at some
point
> this can be reconsidered.
>
> Cheers,
> H.
>
>
> On 07/02/2013 02:53 PM, Christina Chaivorapol wrote:
>> Hi,
>>
>> I seem to be getting different results depending on if I use
select() or
>> mget() with the hgu133plus2.db package for a probe with a 1 probe
to
>> many
>> gene mapping. Does anyone know why there is a discrepancy?
>>
>>> select(hgu133plus2.db, keys="213801_x_at", cols=c("ENTREZID",
>>> "SYMBOL"),
>> keytype="PROBEID")
>> PROBEID ENTREZID SYMBOL
>> 1 213801_x_at 3921 RPSA
>> 2 213801_x_at 388524 RPSAP58
>> 3 213801_x_at 574040 SNORA6
>> 4 213801_x_at 6044 SNORA62
>> 5 213801_x_at 653162 RPSAP9
>> 6 213801_x_at 730029 RPSAP19
>> Warning message:
>> In .generateExtraRows(tab, keys, jointype) :
>> 'select' resulted in 1:many mapping between keys and return rows
>>
>>> mget("213801_x_at", hgu133plus2ENTREZID)
>> $`213801_x_at`
>> [1] NA
>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
methods
>> [8] base
>>
>> other attached packages:
>> [1] hgu133plus2.db_2.9.0 org.Hs.eg.db_2.9.0 RSQLite_0.11.3
>> [4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0
>> [7] BiocGenerics_0.6.0 limma_3.16.2
>>
>> loaded via a namespace (and not attached):
>> [1] IRanges_1.18.0 stats4_3.0.0 tools_3.0.0
>>
>> Thanks,
>> Christina
>>
>