Question

Choosing most accurate gene symbol from multiple gene symbols

0

Entering edit mode

Jo ▴ 30

@jo-8608

Last seen 8.3 years ago

Hi,

I am trying to map Transcript ID's of the HuGene-1_0-st-v1 affy chip to their corresponding Gene Symbols. When I do this I am often getting multiple gene symbols for Transcript ID's.

ex: 8029669 --> CT47A1 /// CT47A6 /// CT47A11 /// CT47A7 /// CT47A2 /// CT47A8 /// CT47A9 /// CT47A3 /// CT47A4 /// CT47B1 /// CT47A12 /// CT47A5 /// CT47A10"

What would be the best way to choose one of the gene symbols? Also are there any bioconductor packages that will map Transcript ID's of the HuGene-1_0-st-v1 chip to gene symbols?

Thank you.

transcript_id gene symbol • 1.5k views

ADD COMMENT • link updated 8.7 years ago by James W. MacDonald 65k • written 8.7 years ago by Jo ▴ 30

0

Entering edit mode

Can you show the code you used that returned these results?

ADD REPLY • link 8.7 years ago Steve Lianoglou ★ 13k

score 0 · Answer 1 · 2015-08-24

As Steve notes, the example you use doesn't seem to jibe with reality, so it would be interesting to know where you got those data. Regardless, we supply an annotation package for these arrays that you can use, called hugene10sttranscriptcluster.db.

> library(hugene10sttranscriptcluster.db)
Loading required package: org.Hs.eg.db
> select(hugene10sttranscriptcluster.db, "8029669", c("SYMBOL","GENENAME"))
  PROBEID SYMBOL GENENAME
1 8029669   <NA>     <NA>
> select(hugene10sttranscriptcluster.db, "8029669", c("SYMBOL","GENENAME","ENSEMBL","REFSEQ"))
  PROBEID SYMBOL GENENAME ENSEMBL REFSEQ
1 8029669   <NA>     <NA>    <NA>   <NA>

If you want more information, see the vignettes for AnnotationDbi. But since the annotation package says there are not matching annotations, let's look at the Affy CSV annotation file.

grep 8029669 HuGene-1_0-st-v1.na35.hg19.transcript.csv
"8029669","8029669","chr19","+","45842445","45842639","23","---","NONHSAT066772 // NONCODE // Non-coding transcript identified by NONCODE: Linc // chr19 // 100 // 100 // 23 // 23 // 0 /// AF041410 // GenBank // Homo sapiens malignancy-associated protein mRNA, partial cds. // chr19 // 52 // 91 // 11 // 21 // 0","AF041410 // O43616","AF041410 // Hs.348346 // ---","---","---","---","---","---","1","main"

So you shouldn't expect a HUGO gene symbol for this transcript, as it is a LINC RNA. Or maybe something else, like partially. Please note that the annotation packages we supply are based primarily on Entrez Gene, and are also primarily directed towards translated content. Affy has been adding more untranslated content on the Gene ST arrays, and our infrastructure has not kept pace. There have been some discussions about adding untranslated content to the annotation packages, but it is not clear how that would be done in a reasonable manner, and the current state of affairs for most untranslated content is quite preliminary, so the upside of including it is probably fairly limited.