Question: Mapping microarray probes to the genome using findOverlaps
0
gravatar for Ravi Karra
8.3 years ago by
Ravi Karra140
Ravi Karra140 wrote:
Hello, I am still trying to map probes on the Nimblegen Zebrafish 12 x135K Expression to the Zv9 version of the zebrafish genome available from Ensembl! I am very reluctantly pursuing an alignment approach to annotation as the original annotation provided with the array is quite outdated. I performed a gapped alignment using the individual probe sequences (60-mers) from the array using TopHat and loaded the results into Bioconductor as a GappedAlignments object. I have made a TranscriptDb object using the Zv9 genome from Ensembl. Next, I plan to use findOverlaps for the annotation. What is the best way to get the overlaps (by exon, cds, or by transcript)? I am a little concerned that using transcriptsByOverlaps might be a bit too broad and result in mapping reads to multiple genes (for example transcripts in the genome that have overlapping genomic coordinates). By contrast, mapping with exonsByOverlaps and cdsByOverlaps might be too restrictive and miss information in the UTR's. My gut feeling is that annotating by cds is the "in-between" approach. What is the recommended approach for RNA-seq? As you can tell, I am quite new to this! Thanks in advance for your help, Ravi [[alternative HTML version deleted]]
ADD COMMENTlink modified 8.3 years ago by Sean Davis21k • written 8.3 years ago by Ravi Karra140
Answer: Mapping microarray probes to the genome using findOverlaps
0
gravatar for Sean Davis
8.3 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
On Sat, Feb 26, 2011 at 5:29 PM, Ravi Karra <ravi.karra@gmail.com> wrote: > Hello, > > I am still trying to map probes on the Nimblegen Zebrafish 12 x135K > Expression to the Zv9 version of the zebrafish genome available from > Ensembl! I am very reluctantly pursuing an alignment approach to annotation > as the original annotation provided with the array is quite outdated. > > I performed a gapped alignment using the individual probe sequences > (60-mers) from the array using TopHat and loaded the results into > Bioconductor as a GappedAlignments object. I have made a TranscriptDb > object using the Zv9 genome from Ensembl. Next, I plan to use findOverlaps > for the annotation. What is the best way to get the overlaps (by exon, cds, > or by transcript)? I am a little concerned that using transcriptsByOverlaps > might be a bit too broad and result in mapping reads to multiple genes (for > example transcripts in the genome that have overlapping genomic > coordinates). By contrast, mapping with exonsByOverlaps and cdsByOverlaps > might be too restrictive and miss information in the UTR's. My gut feeling > is that annotating by cds is the "in-between" approach. What is the > recommended approach for RNA-seq? As you can tell, I am quite new to this! > > Hi, Ravi. Sorry to answer a question with more questions, but why not just map the probes against Ensembl Transcripts or refseq? What is the advantage of mapping to the genome and then going back to the transcripts? Sean [[alternative HTML version deleted]]
ADD COMMENTlink written 8.3 years ago by Sean Davis21k
Hi Sean, I could do that, but am not sure how to. The annotated zebrafish genome is the Tubingen strain and the some of the probes on the array are from the EK and AB strains. This means that I need to allow for SNP's in the alignment. I originally tried to align the probes to the Ensembl Transcripts using matchPDict but when I allowed for 2 mismatches (max.mismatch = 2) across the probe sequences, my computer never stopped running the program! I found TopHat to be much faster (8 min) and TopHat allows for a few nucleotide wobble by default!. Do you have a suggestion(s) on another way to align the array to the Ensembl Transcripts? Thanks, Ravi On Feb 26, 2011, at 5:45 PM, Sean Davis wrote: > > > On Sat, Feb 26, 2011 at 5:29 PM, Ravi Karra <ravi.karra@gmail.com> wrote: > Hello, > > I am still trying to map probes on the Nimblegen Zebrafish 12 x135K Expression to the Zv9 version of the zebrafish genome available from Ensembl! I am very reluctantly pursuing an alignment approach to annotation as the original annotation provided with the array is quite outdated. > > I performed a gapped alignment using the individual probe sequences (60-mers) from the array using TopHat and loaded the results into Bioconductor as a GappedAlignments object. I have made a TranscriptDb object using the Zv9 genome from Ensembl. Next, I plan to use findOverlaps for the annotation. What is the best way to get the overlaps (by exon, cds, or by transcript)? I am a little concerned that using transcriptsByOverlaps might be a bit too broad and result in mapping reads to multiple genes (for example transcripts in the genome that have overlapping genomic coordinates). By contrast, mapping with exonsByOverlaps and cdsByOverlaps might be too restrictive and miss information in the UTR's. My gut feeling is that annotating by cds is the "in-between" approach. What is the recommended approach for RNA-seq? As you can tell, I am quite new to this! > > > Hi, Ravi. Sorry to answer a question with more questions, but why not just map the probes against Ensembl Transcripts or refseq? What is the advantage of mapping to the genome and then going back to the transcripts? > > Sean [[alternative HTML version deleted]]
ADD REPLYlink written 8.3 years ago by Ravi Karra140
On Sat, Feb 26, 2011 at 5:56 PM, Ravi Karra <ravi.karra@gmail.com> wrote: > Hi Sean, > I could do that, but am not sure how to. The annotated zebrafish genome is > the Tubingen strain and the some of the probes on the array are from the EK > and AB strains. This means that I need to allow for SNP's in the > alignment. I originally tried to align the probes to the Ensembl > Transcripts using matchPDict but when I allowed for 2 mismatches > (max.mismatch = 2) across the probe sequences, my computer never stopped > running the program! I found TopHat to be much faster (8 min) and TopHat > allows for a few nucleotide wobble by default!. > Do you have a suggestion(s) on another way to align the array to the > Ensembl Transcripts? > Hi, Ravi. You could try blat, blast, gmap, or ssaha, for example. I have used blat and gmap successfully to annotate probes. Against transcripts, blat or gmap should run in seconds to minutes. Sean > Thanks, > Ravi > > > > > On Feb 26, 2011, at 5:45 PM, Sean Davis wrote: > > > > On Sat, Feb 26, 2011 at 5:29 PM, Ravi Karra <ravi.karra@gmail.com> wrote: > >> Hello, >> >> I am still trying to map probes on the Nimblegen Zebrafish 12 x135K >> Expression to the Zv9 version of the zebrafish genome available from >> Ensembl! I am very reluctantly pursuing an alignment approach to annotation >> as the original annotation provided with the array is quite outdated. >> >> I performed a gapped alignment using the individual probe sequences >> (60-mers) from the array using TopHat and loaded the results into >> Bioconductor as a GappedAlignments object. I have made a TranscriptDb >> object using the Zv9 genome from Ensembl. Next, I plan to use findOverlaps >> for the annotation. What is the best way to get the overlaps (by exon, cds, >> or by transcript)? I am a little concerned that using transcriptsByOverlaps >> might be a bit too broad and result in mapping reads to multiple genes (for >> example transcripts in the genome that have overlapping genomic >> coordinates). By contrast, mapping with exonsByOverlaps and cdsByOverlaps >> might be too restrictive and miss information in the UTR's. My gut feeling >> is that annotating by cds is the "in-between" approach. What is the >> recommended approach for RNA-seq? As you can tell, I am quite new to this! >> >> > Hi, Ravi. Sorry to answer a question with more questions, but why not just > map the probes against Ensembl Transcripts or refseq? What is the advantage > of mapping to the genome and then going back to the transcripts? > > Sean > > > [[alternative HTML version deleted]]
ADD REPLYlink written 8.3 years ago by Sean Davis21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 360 users visited in the last hour