Using hugene10sttranscriptcluster.db for annotation
2
0
Entering edit mode
Khawaja • 0
@khawaja-7772
Last seen 6.1 years ago
India

I am trying to annotate the data (with gene name and gene symbol) from [HuGene-1_0-st] Affymetrix Human Gene 1.0 ST Array [transcript (gene) version] platform using hugene10sttranscriptcluster.db.  I use the following script to get the gene symbol.

x = hugene10sttranscriptclusterALIAS2PROBE
mapped_probes <- mappedkeys(x)
xx <- as.list(hugene10sttranscriptclusterALIAS2PROBE)

but it gives me many names for a probe identifier, I am not sure which one to keep. I want to get a unique gene symbol for probe identifier. Help appreciated.

 

bioconductor hugene10sttranscriptcluster.db annotation • 1.6k views
ADD COMMENT
2
Entering edit mode
@agaz-hussain-wani-7620
Last seen 6.1 years ago
India

"I want to get a unique gene symbol for probe identifier"

You are not doing it right. For the above script, the package vignette says like this:

"Each gene symbol is mapped to a named vector of manufacturer identifiers. The name represents the gene symbol and the vector contains all manufacturer identifiers that are found for that symbol.An NA is reported for any gene symbol that cannot be mapped to any manufacturer identifiers. This mapping includes ALL gene symbols including those which are  already listed in the SYMBOL map. The SYMBOL map is meant to only list official gene symbols, while the ALIAS maps are meant to store all used symbols".

And you are interested in something else, the script of which is shown below:

x <- hugene10sttranscriptclusterSYMBOL
# Get the probe identifiers that are mapped to a gene symbol
mapped_probes <- mappedkeys(x)
# Convert to a list
xx <- as.list(x[mapped_probes])

and the saying in the package vignette goes like this:

"Each manufacturer identifier is mapped to an abbreviation for the corresponding gene. An NA is reported if there is no  known abbreviation for a given gene. Symbols typically consist of 3 letters that define either a single gene (ABC) or multiple genes (ABC1, ABC2, ABC3). Gene symbols can be used as key words to query public databases such as Entrez Gene".

 

 

ADD COMMENT
0
Entering edit mode

Thank you :)

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States

While you can use as.list to get the values, it's really not the way to go. You end up relying on old code that doesn't really do what you might think it does. Plus, why would you want a list in the first place?

As an example:

> z <- as.list(hugene10sttranscriptclusterSYMBOL[keys(hugene10sttranscriptcluster.db)])
> table(sapply(z, is.na))

FALSE  TRUE
19985 13312

So using as.list returns 13312 probesets that appear not to have a HUGO symbol appended. But that's not quite right:

> zz <- mapIds(hugene10sttranscriptcluster.db, keys(hugene10sttranscriptcluster.db), "SYMBOL", "PROBEID")
'select()' returned 1:many mapping between keys and columns

> sum(is.na(zz))
[1] 10992

There are actually a bit over 2000 probeids that do have a HUGO symbol, for which as.list is returning NA. This is because those probeids have multiple symbols!

> ind <- sapply(z, is.na) & !is.na(zz)
> sum(ind)
[1] 2320
> head(select(hugene10sttranscriptcluster.db, names(zz)[ind], "SYMBOL"))
'select()' returned 1:many mapping between keys and columns
  PROBEID      SYMBOL
1 7896740       OR4F4
2 7896740      OR4F17
3 7896740       OR4F5
4 7896742 LINC00266-1
5 7896742      PCMTD2
6 7896742   LOC728323

The old style functions return NA for any probeset ID that maps to more than one HUGO symbol (or whatever else you are mapping to), whereas mapIds and select do not. So it's a better idea to use the more current query methods.

In addition, assuming your data are in an ExpressionSet, you can use annotateEset from my affycoretools package to do this in one line. Say your ExpressionSet is called 'eset':

library(affycoretools)

eset <- annotateEset(eset, hugene10sttranscriptcluster.db)

And then model fitting tools like limma will return fully annotated topTable results.

ADD COMMENT

Login before adding your answer.

Traffic: 787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6