The annaffy package was developed years ago, and uses an outmoded way of extracting data. You are better off using either select
or mapIds
, but do note that the reason you get so many NA values is because in the past any probeset that mapped to more than one gene ID would return NA (where the thinking was that it's not clear what that probeset measures), whereas now select
will return all such values:
> select(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL")
'select()' returned 1:many mapping between keys and columns
PROBEID SYMBOL
1 7896740 OR4F4
2 7896740 OR4F17
3 7896740 OR4F5
4 7896742 LINC00266-1
5 7896742 PCMTD2
6 7896742 LINC01881
As compared to what aafSymbol
does under the hood
> mget( c("7896740","7896742"), hugene10sttranscriptclusterSYMBOL)
$`7896740`
[1] NA
$`7896742`
[1] NA
And mapIds
allows you to control the multi-mapping:
> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID")
'select()' returned 1:many mapping between keys and columns
7896740 7896742
"OR4F4" "LINC00266-1"
> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID", multiVals = "list")
'select()' returned 1:many mapping between keys and columns
$`7896740`
[1] "OR4F4" "OR4F17" "OR4F5"
$`7896742`
[1] "LINC00266-1" "PCMTD2" "LINC01881"
> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID", multiVals = "CharacterList")
'select()' returned 1:many mapping between keys and columns
CharacterList of length 2
[["7896740"]] OR4F4 OR4F17 OR4F5
[["7896742"]] LINC00266-1 PCMTD2 LINC01881
> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID", multiVals = "asNA")
'select()' returned 1:many mapping between keys and columns
7896740 7896742
NA NA