Choosing most accurate gene symbol from multiple gene symbols
1
0
Entering edit mode
Jo ▴ 30
@jo-8608
Last seen 8.3 years ago

Hi,

I am trying to map Transcript ID's of the HuGene-1_0-st-v1 affy chip to their corresponding Gene Symbols. When I do this I am often getting multiple gene symbols for Transcript ID's. 

ex:  8029669 --> CT47A1 /// CT47A6 /// CT47A11 /// CT47A7 /// CT47A2 /// CT47A8 /// CT47A9 /// CT47A3 /// CT47A4 /// CT47B1 /// CT47A12 /// CT47A5 /// CT47A10"

What would be the best way to choose one of the gene symbols? Also are there any bioconductor packages that will map Transcript ID's of the HuGene-1_0-st-v1 chip to gene symbols?

Thank you.

transcript_id gene symbol • 1.5k views
ADD COMMENT
0
Entering edit mode

Can you show the code you used that returned these results?

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 15 hours ago
United States

As Steve notes, the example you use doesn't seem to jibe with reality, so it would be interesting to know where you got those data. Regardless, we supply an annotation package for these arrays that you can use, called hugene10sttranscriptcluster.db.

> library(hugene10sttranscriptcluster.db)
Loading required package: org.Hs.eg.db
> select(hugene10sttranscriptcluster.db, "8029669", c("SYMBOL","GENENAME"))
  PROBEID SYMBOL GENENAME
1 8029669   <NA>     <NA>
> select(hugene10sttranscriptcluster.db, "8029669", c("SYMBOL","GENENAME","ENSEMBL","REFSEQ"))
  PROBEID SYMBOL GENENAME ENSEMBL REFSEQ
1 8029669   <NA>     <NA>    <NA>   <NA>

If you want more information, see the vignettes for AnnotationDbi. But since the annotation package says there are not matching annotations, let's look at the Affy CSV annotation file.

grep 8029669 HuGene-1_0-st-v1.na35.hg19.transcript.csv
"8029669","8029669","chr19","+","45842445","45842639","23","---","NONHSAT066772 // NONCODE // Non-coding transcript identified by NONCODE: Linc // chr19 // 100 // 100 // 23 // 23 // 0 /// AF041410 // GenBank // Homo sapiens malignancy-associated protein mRNA, partial cds. // chr19 // 52 // 91 // 11 // 21 // 0","AF041410 // O43616","AF041410 // Hs.348346 // ---","---","---","---","---","---","1","main"

So you shouldn't expect a HUGO gene symbol for this transcript, as it is a LINC RNA. Or maybe something else, like partially. Please note that the annotation packages we supply are based primarily on Entrez Gene, and are also primarily directed towards translated content. Affy has been adding more untranslated content on the Gene ST arrays, and our infrastructure has not kept pace. There have been some discussions about adding untranslated content to the annotation packages, but it is not clear how that would be done in a reasonable manner, and the current state of affairs for most untranslated content is quite preliminary, so the upside of including it is probably fairly limited.

ADD COMMENT
0
Entering edit mode

Awesome thanks. 

ADD REPLY

Login before adding your answer.

Traffic: 544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6