probe alignment clustering
1
0
Entering edit mode
Stan Smiley ▴ 80
@stan-smiley-567
Last seen 9.6 years ago
I'm in the process of validating the annotation from Affymetrix, and positioning each 25mer probe on the genome (just mouse and human so far). I'm in a position now to use this positioning data to validate the affy annotated alignments, which I'm doing now. My challenge now is to settle on the best approach in BioC/R to find 'consensus' sequences in the genome that best match the alignments I've come up with. I'm thinking some clustering package, but not sure which one is most appropriate. I searched the BioC archives but couldn't find anything on this subject. Any direction on this would be greatly appreciated! Thanks, Stan Smiley
Annotation affy PROcess Annotation affy PROcess • 765 views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 12 weeks ago
United States
On Apr 5, 2005, at 1:05 PM, Stan Smiley wrote: > I'm in the process of validating the annotation from Affymetrix, and > positioning each 25mer probe > on the genome (just mouse and human so far). I'm in a position now to > use this > positioning data to validate the affy annotated alignments, which I'm > doing now. > Stan, My first question would be: Have you looked at the annotation done by EnsEMBL (or any of several other groups)? Presumably, yes. Second, are you going to be using this for expression or "genome localization"? Keep in mind that aligning to the genome is NOT in the same space as aligning to transcripts (there are, of course, probes that align to the genome that do not hit transcripts, and probes that align to transcripts that will not align to the genome--those that cross exons). So, coming up with a consensus genomic sequence will not necessarily be very useful by itself--you will have to re-align this to transcripts to determine what you are measuring. An alternative method is to do the alignments of individual probes and look for overlap with annotated regions (exons). This, you can do in the UCSC genome browser fairly easily. Just make a custom track of your data and then get the genes that overlap with your annotation. Then, you can see if various probesets generally hit the genes that affy says they do. You could do this in R if you wanted to by downloading the table of interest from UCSC, parsing it, and writing a function to look for overlaps. > My challenge now is to settle on the best approach in BioC/R to find > 'consensus' sequences > in the genome that best match the alignments I've come up with. Do you really want consensus genomic sequence? > I'm > thinking some > clustering package, but not sure which one is most appropriate. > Any should in theory work, but you will still have to decide at what "distance" to make the cuts of the tree. Given that some genes are small, some are large, some are close to each other, some are far away (try to think of a clustering method that can accurately define a "gene" from the HOXA region [short, closely-spaced genes] AND the NF1 gene [long, complicated gene], as an example). I know I didn't give you a direct answer, and you have a hard problem, in many senses. In any case, hope this helps a bit.... Sean
ADD COMMENT

Login before adding your answer.

Traffic: 972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6