Question

HuGene2.0 array annotation

0

Entering edit mode

sylvia ▴ 10

@sylvia-5630

Last seen 6.2 years ago

HI,

I have a quick question regarding the annotation package for HuGene2.0 array. Previously with the pd.hugene.2.0.st_3.10.0 package (R_3.1.3, oligo_1.30.0), I was able to find annotation related to the gene PTEN, however, when I updated the package to pd.hugene.2.0.st_3.14.1 (R_3.2.1, oligo_1.32.0), I got NA for geneassignment (attached below):

> features <- pData(featureData(dataset$eset))
> features <- features[features$category == 'main',c('transcriptclusterid','seqname','start','stop','geneassignment','mrnaassignment', 'unigene')];
> x <- features[features$transcriptclusterid == '16707030',]
> str(x)
'data.frame':   1 obs. of 7 variables:
$ transcriptclusterid: int 16707030
$ seqname            : chr "chr10"
$ start              : int 89622870
$ stop               : int 89731687
$ geneassignment     : chr "ENST00000371953 // PTEN // phosphatase and tensin homolog // 10q23.3 // 5728 /// NM_000314 // PTEN // phosphatase and tensin ho"| __truncated__
$ mrnaassignment     : chr "ENST00000371953 // ENSEMBL // cdna:known chromosome:GRCh37:10:89622870:89731687:1 gene:ENSG00000171862 gene_biotype:protein_cod"| __truncated__
$ unigene            : chr "ENST00000371953 // Hs.500466 // adipose tissue| adrenal gland| bladder| blood| bone| brain| cervix| connective tissue| ear| emb"| __truncated__

-------------------------------------------------------------------------------------------------------------------------------------

> features <- pData(featureData(dataset$eset));

> features <- features[features$category == 'main',c('transcriptclusterid','seqname','start','stop','geneassignment','mrnaassignment', 'unigene')];

> x <- features[features$transcriptclusterid == '16707030',]
> str(x)
'data.frame':   1 obs. of 7 variables:
$ transcriptclusterid: int 16707030
$ seqname            : chr "chr10"
$ start              : int 89622870
$ stop               : int 89731687
$ geneassignment     : chr NA
$ mrnaassignment     : chr NA
$ unigene            : chr NA

Thank,

Sylvia

annotation oligo pd.hugene.2.0.st • 1.7k views

ADD COMMENT • link 8.5 years ago sylvia ▴ 10

score 1 · Accepted Answer · 2015-11-10

1

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

The annotation data that we supply is basically just a re-packaging of the Affymetrix csv annotation files. In the case of the pd.hugene.2.0.st package it is a direct read-in of that file, whereas in the hugene20sttranscriptcluster.db package we take Affy's mapping of probeset ID ->Entrez Gene ID and then do all the further mappings from the Gene ID. But in the end the results are pretty much the same.

Affymetrix does update their annotation csv files, and we use the most updated version thereof. At this point we are using the na35 versions, which no longer say that 16707030 measures PTEN:

grep 16707030 HuGene-2_0-st-v1.na33.2.hg19.transcript.csv | cut -d, -f 2,3,5,6,8
"16707030","chr10","89622870","89731687","ENST00000371953 // PTEN // phosphatase and tensin homolog // 10q23.3 // 5728 /// NM_000314 // PTEN // phosphatase and tensin homolog // 10q23.3 // 5728"
grep 16707030 HuGene-2_0-st-v1.na35.hg19.transcript.csv | cut -d, -f 2,3,5,6,8
"16707030","chr10","89622870","89731687","---"

Anyway, if you were to go to netaffx, get the sequence that they say that transcript interrogates, and blat that at UCSC, it doesn't look like it overlaps PTEN (the sequence is the black bar labeled 'Your Sequence from Blat Search').

ADD COMMENT • link 8.5 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James,

Got it! Thanks for the quick reply! I think I'll use hugene20sttranscriptcluster.db instead then.

Best,

Sylvia

ADD REPLY • link 8.5 years ago sylvia ▴ 10

0

Entering edit mode

Hi James,

Sorry, but I just have another clarifying question. I was wondering where you obtained the sequence for blat. I went to netaffx and looked up the sequence for transcriptclusterID 16707030 using:

https://www.affymetrix.com/analysis/netaffx/exon/wtgene_transcript.affx?pk=712:16707030

and I found that the sequence overlapped with PTEN:

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr10%3A89622870-89731687&hgsid=452878227_H7OFAdKA7suL8UyYDlBvVf3Owpkf

Would you mind correcting me if this was not the approach that you took? Sorry for the question. Thank you in advance for the clarification.

Best,
Sylvia

ADD REPLY • link 8.5 years ago sylvia ▴ 10

0

Entering edit mode

Hi Sylvia,

You're right - I just copied the first bit of the transcript from Affy. If you click on the link to get the whole transcript and then blat, it does indeed cover PTEN.

Like I said before, we are just passing on what we get from Affy. A while back a co-worker of mine who was working with Exon ST arrays noticed that the na34 build from Affy had far fewer annotations than an earlier build he had used. So we contacted Affy and they told us that they knew the na34 build was bad, and had been working on an update, which is the na35 build. It may well be that na35 isn't particularly good either, but 'good' depends on your frame of reference. If they are getting 95% of the annotations right, then it might be good for most people. But if the gene you care about is messed up, then obviously it's not good enough for you.

Unfortunately we don't have enough people to fully annotate the arrays (we supply annotation packages for way too many arrays for that to happen), so we have to rely on the manufacturers to get it right.

ADD REPLY • link 8.5 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James,

Thanks for the clarification! Yeah, I checked the whole transcript and found out it overlap with PTEN. I'll contact Affy as well to see how they think about this.

Best,

Sylvia

ADD REPLY • link 8.5 years ago sylvia ▴ 10