Extract gene ID
1
0
Entering edit mode
@lalchungnungabt17-16075
Last seen 24 months ago

Hi,

I have microarray expression data set which was annotated with pd.hugene.1.1.st.v1 and i want to extract the gene ID from the file. Below is the script i used and i give out many NA gene ID. I just want to confirm if this is the correct way of doing it. Thank you in advance!

> load("geneCore.RData") ## this file was annotated with pd.hugene.1.1.st.v1;


>ID <- rownames(exprs(geneCore));

>symbols <- aafSymbol(ID, "hugene11sttranscriptcluster.db");
pd.hugene.1.1.st.v1 hugene11sttranscriptcluster.db • 406 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 57 minutes ago
United States

The annaffy package was developed years ago, and uses an outmoded way of extracting data. You are better off using either select or mapIds, but do note that the reason you get so many NA values is because in the past any probeset that mapped to more than one gene ID would return NA (where the thinking was that it's not clear what that probeset measures), whereas now select will return all such values:

> select(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL")
'select()' returned 1:many mapping between keys and columns
  PROBEID      SYMBOL
1 7896740       OR4F4
2 7896740      OR4F17
3 7896740       OR4F5
4 7896742 LINC00266-1
5 7896742      PCMTD2
6 7896742   LINC01881

As compared to what aafSymbol does under the hood

> mget( c("7896740","7896742"), hugene10sttranscriptclusterSYMBOL)
$`7896740`
[1] NA

$`7896742`
[1] NA

And mapIds allows you to control the multi-mapping:

> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID")
'select()' returned 1:many mapping between keys and columns
      7896740       7896742 
      "OR4F4" "LINC00266-1" 
> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID", multiVals = "list")
'select()' returned 1:many mapping between keys and columns
$`7896740`
[1] "OR4F4"  "OR4F17" "OR4F5" 

$`7896742`
[1] "LINC00266-1" "PCMTD2"      "LINC01881"  

> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID", multiVals = "CharacterList")
'select()' returned 1:many mapping between keys and columns
CharacterList of length 2
[["7896740"]] OR4F4 OR4F17 OR4F5
[["7896742"]] LINC00266-1 PCMTD2 LINC01881

> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID", multiVals = "asNA")
'select()' returned 1:many mapping between keys and columns
7896740 7896742 
     NA      NA 
ADD COMMENT

Login before adding your answer.

Traffic: 615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6