Question: Choosing most accurate gene symbol from multiple gene symbols
0
gravatar for Jo
4.3 years ago by
Jo10
Jo10 wrote:

Hi,

I am trying to map Transcript ID's of the HuGene-1_0-st-v1 affy chip to their corresponding Gene Symbols. When I do this I am often getting multiple gene symbols for Transcript ID's. 

ex:  8029669 --> CT47A1 /// CT47A6 /// CT47A11 /// CT47A7 /// CT47A2 /// CT47A8 /// CT47A9 /// CT47A3 /// CT47A4 /// CT47B1 /// CT47A12 /// CT47A5 /// CT47A10"

What would be the best way to choose one of the gene symbols? Also are there any bioconductor packages that will map Transcript ID's of the HuGene-1_0-st-v1 chip to gene symbols?

Thank you.

gene symbol transcript_id • 795 views
ADD COMMENTlink modified 4.3 years ago by James W. MacDonald52k • written 4.3 years ago by Jo10

Can you show the code you used that returned these results?

ADD REPLYlink written 4.3 years ago by Steve Lianoglou12k
Answer: Choosing most accurate gene symbol from multiple gene symbols
0
gravatar for James W. MacDonald
4.3 years ago by
United States
James W. MacDonald52k wrote:

As Steve notes, the example you use doesn't seem to jibe with reality, so it would be interesting to know where you got those data. Regardless, we supply an annotation package for these arrays that you can use, called hugene10sttranscriptcluster.db.

> library(hugene10sttranscriptcluster.db)
Loading required package: org.Hs.eg.db
> select(hugene10sttranscriptcluster.db, "8029669", c("SYMBOL","GENENAME"))
  PROBEID SYMBOL GENENAME
1 8029669   <NA>     <NA>
> select(hugene10sttranscriptcluster.db, "8029669", c("SYMBOL","GENENAME","ENSEMBL","REFSEQ"))
  PROBEID SYMBOL GENENAME ENSEMBL REFSEQ
1 8029669   <NA>     <NA>    <NA>   <NA>

If you want more information, see the vignettes for AnnotationDbi. But since the annotation package says there are not matching annotations, let's look at the Affy CSV annotation file.

grep 8029669 HuGene-1_0-st-v1.na35.hg19.transcript.csv
"8029669","8029669","chr19","+","45842445","45842639","23","---","NONHSAT066772 // NONCODE // Non-coding transcript identified by NONCODE: Linc // chr19 // 100 // 100 // 23 // 23 // 0 /// AF041410 // GenBank // Homo sapiens malignancy-associated protein mRNA, partial cds. // chr19 // 52 // 91 // 11 // 21 // 0","AF041410 // O43616","AF041410 // Hs.348346 // ---","---","---","---","---","---","1","main"

So you shouldn't expect a HUGO gene symbol for this transcript, as it is a LINC RNA. Or maybe something else, like partially. Please note that the annotation packages we supply are based primarily on Entrez Gene, and are also primarily directed towards translated content. Affy has been adding more untranslated content on the Gene ST arrays, and our infrastructure has not kept pace. There have been some discussions about adding untranslated content to the annotation packages, but it is not clear how that would be done in a reasonable manner, and the current state of affairs for most untranslated content is quite preliminary, so the upside of including it is probably fairly limited.

ADD COMMENTlink written 4.3 years ago by James W. MacDonald52k

Awesome thanks. 

ADD REPLYlink written 4.3 years ago by Jo10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 414 users visited in the last hour