Hi,
I found one gene of my top interest, FKBP5, is missing in hugene10sttranscriptcluster.db (v.8.7.0), but is present in the older version v8.2.0. The codes are:
>library(hugene10sttranscriptcluster.db)
>mapped_probes <- mappedkeys(hugene10sttranscriptclusterSYMBOL)
>xx <- as.list(hugene10sttranscriptclusterSYMBOL[mapped_probes])
>'FKBP5'%in%xx
>mapped_probes <- mappedkeys(hugene10sttranscriptclusterENTREZID)
>xx <- as.list(hugene10sttranscriptclusterENTREZID[mapped_probes])
>'2289'%in%xx
>mapped_probes <- mappedkeys(hugene10sttranscriptclusterENSEMBL)
>xx <- as.list(hugene10sttranscriptclusterENSEMBL[mapped_probes])
>"ENSG00000096060"%in%xx
The outputs are all TRUE using v8.2.0 while all FALSE using v8.7.0. I haven't checked from which version it becomes missing. Since FKBP5 is a major protein-coding gene and seems unlikely missing in RefSeq, GenBank, or Entrez Gene. I'm wondering if you have any idea about this? Thanks!
Mengyuan
Also do note that mappedkeys gives you the keys (probeset IDs), not the things that are mapped. So every time you used mappedkeys, you got back just the probeset IDs, so by definition none of the things you were looking for would be in there.
You should be using
select
to do queries, not using the old BiMap interface. You could use thekeys
function however:or
Thanks for the explanations. The select function does show the outputs, but it cannot explain why the probe ID 8125919 is missing in mappedkeys, and also cannot explain why it exists in mappedkeys if using v.8.2.0.
"Also do note that mappedkeys gives you the keys (probeset IDs), not the things that are mapped." -- Please note that I used "as.list" to obtain a list with mapped gene symbols as values and with probeset ids as keys.
I checked 8125919 in two hugene10sttranscriptcluster.db versions:
In v8.2.0, probe id 8125919 is uniquely mapped to FKBP5:
> select(hugene10sttranscriptcluster.db, "8125919",c("PROBEID","ENTREZID","ENSEMBL","SYMBOL"), "PROBEID")
'select()' returned 1:1 mapping between keys and columns
PROBEID ENTREZID ENSEMBL SYMBOL
1 8125919 2289 ENSG00000096060 FKBP5
While in v8.7.0, 8125919 is mapped to two gene symbols:
>select(hugene10sttranscriptcluster.db, "8125919",c("PROBEID","ENTREZID","ENSEMBL","SYMBOL"), "PROBEID")
'select()' returned 1:many mapping between keys and columns
PROBEID ENTREZID ENSEMBL SYMBOL
1 8125919 2289 ENSG00000096060 FKBP5
2 8125919 285847 <NA> LOC285847
So I guess because the probe ids are mapped to more than one gene symbols and entrez ids, it was excluded from mappedkeys in the 8.7.0 version of hugene10sttranscriptcluster.db.
One reason I suggested not using the old BiMap interface is because the default was to make any multi-mapping probeset return NA, because the argument was that we couldn't say for sure what the probeset was measuring. With the newer database type interface we just return all the data, including the probes that have one to many mappings and let the end user sort it out.
In addition, what you are doing still doesn't make sense - if you convert a BiMap object to a list, the names of the list are still the probeset IDs. So you are looking for the symbol and Entrez Gene ID in the set of probeset IDs, rather than in the list members. For example
If you want the multi-mapping probes, you need to use
toggleProbes
first.or alternatively
Also do note that we are simply re-packaging information that we get from Affy. If they update their annotation file to say that a given probeset measures something completely different, then our annotation packages will reflect that. We don't do any vetting of their (or anybody's) annotation, and are simply in the business of putting those data in a format that we think is simpler for our end users to utilize.
This explanation really helps. Thanks!