Trouble getting probe ID info from mouse4302.db annotation package with a specific RefSeq ID.
Entering edit mode
koreebay • 0
Last seen 5.9 years ago

I would like to find the Probe IDs from a list of RefSeq IDs of interest. I've been able to do this by:

> library("annotate")
> library("mouse4302.db")
> as.list(revmap(mouse4302REFSEQ)["NM_023403"])
[1] "1416181_at" "1436749_at" "1444035_at"

However, I came across a problem when one of the RefSeq IDs, 'NM_001166535' returned 'NA'.

> as.list(revmap(mouse4302REFSEQ)["NM_001166535"])
[1] NA

Looking up NM_001166535 on BioGPS gave me that this RefSeq ID is associated with the Probe ID 1416184_s_at. Looking this probe up using mouse4302.db gives me:

> select(mouse4302.db, keys='1416184_s_at', columns=c("REFSEQ"), keytype="PROBEID")
'select()' returned 1:many mapping between keys and columns
        PROBEID       REFSEQ
1  1416184_s_at NM_001025427
2  1416184_s_at NM_001039356
3  1416184_s_at NM_001166535
4  1416184_s_at NM_001166536
5  1416184_s_at NM_001166537
6  1416184_s_at NM_001166539
7  1416184_s_at NM_001166540
8  1416184_s_at NM_001166541
9  1416184_s_at NM_001166542
10 1416184_s_at NM_001166543
11 1416184_s_at NM_001166544
12 1416184_s_at NM_001166545
13 1416184_s_at NM_001166546
14 1416184_s_at    NM_016660
15 1416184_s_at NP_001020598
16 1416184_s_at NP_001034445
17 1416184_s_at NP_001160007
18 1416184_s_at NP_001160008
19 1416184_s_at NP_001160009
20 1416184_s_at NP_001160011
21 1416184_s_at NP_001160012
22 1416184_s_at NP_001160013
23 1416184_s_at NP_001160014
24 1416184_s_at NP_001160015
25 1416184_s_at NP_001160016
26 1416184_s_at NP_001160017
27 1416184_s_at NP_001160018
28 1416184_s_at    NP_057869
29 1416184_s_at NM_001166476
30 1416184_s_at NM_001166477
31 1416184_s_at NP_001159948
32 1416184_s_at NP_001159949

As you can see on line 3, NM_001166535 is in the data, but why did as.list(revmap(mouse4302REFSEQ)["NM_001166535"]) return 'NA'?

Searching the Probe ID using mouse4302REFSEQ also returns 'NA' as well.

> mouse4302REFSEQ$"1416184_s_at"
[1] NA

What am I doing wrong? And is there a better alternative for getting Probe IDs from a RefSeq ID?

 attached packages:
 [1] mouse4302.db_3.2.2   RSQLite_1.0.0        DBI_0.3.1            annotate_1.48.0     
 [6] XML_3.98-1.3         AnnotationDbi_1.32.0 IRanges_2.4.1        S4Vectors_0.8.1      Biobase_2.30.0      
[11] BiocGenerics_0.16.1 
annotationdbi mouse4302 affymetrixchip annotationdata R • 1.0k views
Entering edit mode
Last seen 1 hour ago
United States

You are using the BiMap interface, which is old now, and has been superceded. One of the problems (IMO) with the BiMap interface is that by default any one-to-many mappings are ignored, and you just get an NA returned. You could get around this by using toggleProbes(), but not many people know about that.

The 'modern' way to get annotation data is to use select() or mapIds(). The former method will return all of the one-to-many mappings, and can be used to return multiple annotations at once (e.g., you could get the RefSeq, Entrez Gene, HUGO Symbol, etc all at once). The latter gives you more control over the return object, including how to deal with one-to-many mappings, but only works with one annotation at a time. You have seen what select() returns. Here are the choices for mapIds():

> mapIds(mouse4302.db, "1416184_s_at", "REFSEQ","PROBEID")
> mapIds(mouse4302.db, "1416184_s_at", "REFSEQ","PROBEID", multiVals = "list")
 [1] "NM_001025427" "NM_001039356" "NM_001166535" "NM_001166536" "NM_001166537"
 [6] "NM_001166539" "NM_001166540" "NM_001166541" "NM_001166542" "NM_001166543"
[11] "NM_001166544" "NM_001166545" "NM_001166546" "NM_016660"    "NP_001020598"
[16] "NP_001034445" "NP_001160007" "NP_001160008" "NP_001160009" "NP_001160011"
[21] "NP_001160012" "NP_001160013" "NP_001160014" "NP_001160015" "NP_001160016"
[26] "NP_001160017" "NP_001160018" "NP_057869"    "NM_001166476" "NM_001166477"
[31] "NP_001159948" "NP_001159949"
> mapIds(mouse4302.db, "1416184_s_at", "REFSEQ","PROBEID", multiVals = "filter")
> mapIds(mouse4302.db, "1416184_s_at", "REFSEQ","PROBEID", multiVals = "asNA")
> mapIds(mouse4302.db, "1416184_s_at", "REFSEQ","PROBEID", multiVals = "CharacterList")
CharacterList of length 1
[["1416184_s_at"]] NM_001025427 NM_001039356 ... NP_001159948 NP_001159949

Each of the return types has its uses, depending on how you plan to deal with the multi-mapped data.

Entering edit mode

Thanks! This answered my question, and I was able to learn something new.


Login before adding your answer.

Traffic: 356 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6