multiple locations for probeset in hgu133plus2CHRLOC vs. UCSC PSL data
0
0
Entering edit mode
rgentleman ★ 5.5k
@rgentleman-7725
Last seen 9.0 years ago
United States
To follow up slightly On Tue, Nov 18, 2008 at 9:57 AM, Marc Carlson <mcarlson@fhcrc.org> wrote: > Hi Peter, > > I think that your confusion is coming from the fact that these are the > chromosome start locations for the genes and not the probes. According > to Affy, that probe is supposed to be measuring that gene and we took > their word for that. We then gave you the start positions for > transcripts of that gene according to UCSC. We don't currently provide > the data for where the probe aligns to the genome or to which > transcripts in the genome the probe might stick to. You can easily find all genomic regions using Biostrings, and this is one of the examples in the vignette, I believe. Finding all transcripts is harder (at least in the sense that we have not yet developed a pipeline for it). You would need to download all the transcripts sequences from somewhere (RefSeq?), and then basically modify the example in the Biostrings vignette to do the matching. These are not particularly large or hard problems, so a few hours would deal with the first, maybe a day or two for the second. best wishes Robert > > > > Marc > > > > > Bazeley, Peter wrote: > > Hello, > > > > R version: 2.8.0 > > > > I just installed the hgu133plus2.db package, and am looking at the > hgu133plus2CHRLOC environment. I've noticed that some of the probeset > entries (e.g. "201268_at") have multiple locations compared to Affy's > annotation file. I'd like to figure out if these multiple locations are > current, in which case it is some sort of overlapping/repeating duplication. > For example: > > > > > >> as.list(hgu133plus2CHRLOC)$'201268_at' > >> > > 17 17 17 17 > > 46598879 46597889 46598637 46599081 > > > > indicates that the probeset maps to 4 locations. Compare this to the > alignments info in the Affy's annotation file (from 7/8/08, > http://www.affymetrix.com/Auth/analysis/downloads/na26/ivt/HG- U133_Plus_2.na26.annot.csv.zip > ): > > > > chr12:119204403-119205041 (+) // 91.49 // q24.31 /// > chr17:46598810-46604103 (+) // 96.87 // q21.33 > > > > which suggests one location on chromosome 17 (I'm ignoring chromosome 12 > for now). This is a "_at" probeset, so it should map uniquely to a sequence, > according to Affy's "Data Analysis Fundamentals" document (and speaking to a > rep). > > > > >From the information provided by "?hgu133plus2CHRLOC", I downloaded > > > ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens /database/affyU133Plus2.txt.gz > > from UCSC to see how this occured, but it is not clear. Actually, the > file: > > > http://www.affymetrix.com/Auth/analysis/downloads/psl/HG- U133_Plus_2.link.psl.zip > > from Affy's support page has the same alignment info. Here's the relevant > PSL info: > > > > Target sequence name: chr17 > > Alignment start position in target: 46598810 > > Alignment end position in target: 46604103 > > Number of blocks in the alignment (a block contains no gaps): 5 > > Comma-separated list of sizes of each block: 47,130,102,113,257, > > Comma-separated list of starting positions of each block in target: > 46598810,46599186,46600601,46602296,46603846, > > > > > > The second location provided by CHRLOC (46597889) occurs before the start > of the alignment in the PSL info, so perhaps this one CHRLOC location > corresponds to the PSL alignment? The mappings were obtained from UCSC on > 2006-Apr14, so perhaps additional alignments existed at the time, which have > since been removed. > > > > > > Thank you for any help. Hopefully I'm just missing something obvious > (well, non-obvious for me). > > > > Peter Bazeley > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem@fhcrc.org [[alternative HTML version deleted]]
Alignment Annotation Cancer hgu133plus2 probe affy Biostrings Alignment Annotation Cancer • 878 views
ADD COMMENT

Login before adding your answer.

Traffic: 802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6