Question: Extract gene ID
0
gravatar for lalchungnungabt17
9 weeks ago by
lalchungnungabt170 wrote:

Hi,

I have microarray expression data set which was annotated with pd.hugene.1.1.st.v1 and i want to extract the gene ID from the file. Below is the script i used and i give out many NA gene ID. I just want to confirm if this is the correct way of doing it. Thank you in advance!

> load("geneCore.RData") ## this file was annotated with pd.hugene.1.1.st.v1;


>ID <- rownames(exprs(geneCore));

>symbols <- aafSymbol(ID, "hugene11sttranscriptcluster.db");
ADD COMMENTlink modified 9 weeks ago by James W. MacDonald49k • written 9 weeks ago by lalchungnungabt170
Answer: Extract gene ID
1
gravatar for James W. MacDonald
9 weeks ago by
United States
James W. MacDonald49k wrote:

The annaffy package was developed years ago, and uses an outmoded way of extracting data. You are better off using either select or mapIds, but do note that the reason you get so many NA values is because in the past any probeset that mapped to more than one gene ID would return NA (where the thinking was that it's not clear what that probeset measures), whereas now select will return all such values:

> select(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL")
'select()' returned 1:many mapping between keys and columns
  PROBEID      SYMBOL
1 7896740       OR4F4
2 7896740      OR4F17
3 7896740       OR4F5
4 7896742 LINC00266-1
5 7896742      PCMTD2
6 7896742   LINC01881

As compared to what aafSymbol does under the hood

> mget( c("7896740","7896742"), hugene10sttranscriptclusterSYMBOL)
$`7896740`
[1] NA

$`7896742`
[1] NA

And mapIds allows you to control the multi-mapping:

> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID")
'select()' returned 1:many mapping between keys and columns
      7896740       7896742 
      "OR4F4" "LINC00266-1" 
> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID", multiVals = "list")
'select()' returned 1:many mapping between keys and columns
$`7896740`
[1] "OR4F4"  "OR4F17" "OR4F5" 

$`7896742`
[1] "LINC00266-1" "PCMTD2"      "LINC01881"  

> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID", multiVals = "CharacterList")
'select()' returned 1:many mapping between keys and columns
CharacterList of length 2
[["7896740"]] OR4F4 OR4F17 OR4F5
[["7896742"]] LINC00266-1 PCMTD2 LINC01881

> mapIds(hugene10sttranscriptcluster.db, c("7896740","7896742"), "SYMBOL", "PROBEID", multiVals = "asNA")
'select()' returned 1:many mapping between keys and columns
7896740 7896742 
     NA      NA 
ADD COMMENTlink written 9 weeks ago by James W. MacDonald49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 86 users visited in the last hour