Question

Extract gene ID

0

Entering edit mode

lalchungnungabt17 • 0

@lalchungnungabt17-16075

Last seen 4.3 years ago

Hi,

I have microarray expression data set which was annotated with pd.hugene.1.1.st.v1 and i want to extract the gene ID from the file. Below is the script i used and i give out many NA gene ID. I just want to confirm if this is the correct way of doing it. Thank you in advance!

> load("geneCore.RData") ## this file was annotated with pd.hugene.1.1.st.v1;


>ID <- rownames(exprs(geneCore));

>symbols <- aafSymbol(ID, "hugene11sttranscriptcluster.db");

pd.hugene.1.1.st.v1 hugene11sttranscriptcluster.db • 819 views

ADD COMMENT • link updated 6.1 years ago by James W. MacDonald 68k • written 6.1 years ago by lalchungnungabt17 • 0

score 1 · Answer 1 · 2019-02-11

The annaffy package was developed years ago, and uses an outmoded way of extracting data. You are better off using either select or mapIds, but do note that the reason you get so many NA values is because in the past any probeset that mapped to more than one gene ID would return NA (where the thinking was that it's not clear what that probeset measures), whereas now select will return all such values:

> select(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL")
'select()' returned 1:many mapping between keys and columns
  PROBEID      SYMBOL
1 7896740       OR4F4
2 7896740      OR4F17
3 7896740       OR4F5
4 7896742 LINC00266-1
5 7896742      PCMTD2
6 7896742   LINC01881

As compared to what aafSymbol does under the hood

> mget( c("7896740","7896742"), hugene10sttranscriptclusterSYMBOL)
$`7896740`
[1] NA

$`7896742`
[1] NA

And mapIds allows you to control the multi-mapping:

> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID")
'select()' returned 1:many mapping between keys and columns
      7896740       7896742 
      "OR4F4" "LINC00266-1" 
> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID", multiVals = "list")
'select()' returned 1:many mapping between keys and columns
$`7896740`
[1] "OR4F4"  "OR4F17" "OR4F5" 

$`7896742`
[1] "LINC00266-1" "PCMTD2"      "LINC01881"  

> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID", multiVals = "CharacterList")
'select()' returned 1:many mapping between keys and columns
CharacterList of length 2
[["7896740"]] OR4F4 OR4F17 OR4F5
[["7896742"]] LINC00266-1 PCMTD2 LINC01881

> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID", multiVals = "asNA")
'select()' returned 1:many mapping between keys and columns
7896740 7896742 
     NA      NA