Question: Solexa fastq files and BLAT
0
gravatar for Daniel Brewer
10.0 years ago by
Daniel Brewer1.9k
Daniel Brewer1.9k wrote:
Hi, I have got hold of some solexa results in fastq format from some cancer mRNA samples and I would like to analyse these to look at a number of things. I would like to be able to list the sequences that occur most often and then BLAT them against the human genome to see whether that sequence does occur and if so, is associated with a known transcript. Further down the line I would like to do some comparisons between normal and tumour tissue. >From looking around it seems that Shortread (in the development version) can be used to read in the files into BioStrings objects and then BSgenome can be used to perform some sort of BLAT. Am I on the right lines here? Can anyone add to what packages I should be looking at and what approaches or techniques I should be using. Thanks Dan -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Molecular Carcinogenesis Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}
cancer shortread biostrings • 554 views
ADD COMMENTlink modified 12 days ago by Bioconductor Community ♦♦ 0 • written 10.0 years ago by Daniel Brewer1.9k
Answer: Solexa fastq files and BLAT
0
gravatar for Martin Morgan
10.0 years ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:
Hi Daniel -- "Sean Davis" <sdavis2 at="" mail.nih.gov=""> writes: > On Tue, Jan 13, 2009 at 6:35 AM, Daniel Brewer <daniel.brewer at="" icr.ac.uk="">wrote: > >> Hi, >> >> I have got hold of some solexa results in fastq format from some cancer >> mRNA samples and I would like to analyse these to look at a number of >> things. I would like to be able to list the sequences that occur most >> often and then BLAT them against the human genome to see whether that 'readFastq' and the 'top' component of the return value of 'tables' in ShortRead would allow you to read in the fastq file(s) and tabulate the most common occurences. If these are relatively raw reads, you'll likely be disappointed to find that the most common are Solexa adapter sequences and other artifacts. Also if the sample prep involved a PCR step then likely you'll see PCR artifacts (e.g., differential amplification). If you have access to the _export.txt files then 'qa' and 'report' can provide a useful overview of your data and its limitations; the relatively high-level code used in generating the report, visible in the file at system.file("template", "qa_solexa.Rnw", package="ShortRead") might be suggestive of ways to explore your data (view in a text browser, look for Sweave 'chunks' between <<>>= and @). Also the various vignettes. >> sequence does occur and if so, is associated with a known transcript. >> Further down the line I would like to do some comparisons between normal >> and tumour tissue. >> >> >From looking around it seems that Shortread (in the development version) >> can be used to read in the files into BioStrings objects and then >> BSgenome can be used to perform some sort of BLAT. Am I on the right >> lines here? >> >> Can anyone add to what packages I should be looking at and what >> approaches or techniques I should be using. The IRanges package provides very useful tools, at a slightly more abstract level (current favorites include the Rle-class, which is returned for instance by the 'coverage' function, and the manipulation of IRanges themselves). The rtracklayer package provides a way to expose results as tracks in genome browsers. Biobase::matchpt and the org.* packages can be useful, too. > The Bio-sig-seq list is, perhaps the best place to ask for more details. Yes it would be good to follow up to the bioc-sig-sequencing group. See http://bioconductor.org/docs/mailList.html Martin > The shortreads package combined with the Biostrings package (for doing > alignments) is one possibility. Also, it is possible to do the alignments > outside of R using algorithms like Bowtie, MAQ, or ELAND and the shortreads > package can read those results directly. > > Sean > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENTlink written 10.0 years ago by Martin Morgan ♦♦ 22k
Answer: Solexa fastq files and BLAT
0
gravatar for Sean Davis
10.0 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
On Tue, Jan 13, 2009 at 6:35 AM, Daniel Brewer <daniel.brewer@icr.ac.uk>wrote: > Hi, > > I have got hold of some solexa results in fastq format from some cancer > mRNA samples and I would like to analyse these to look at a number of > things. I would like to be able to list the sequences that occur most > often and then BLAT them against the human genome to see whether that > sequence does occur and if so, is associated with a known transcript. > Further down the line I would like to do some comparisons between normal > and tumour tissue. > > >From looking around it seems that Shortread (in the development version) > can be used to read in the files into BioStrings objects and then > BSgenome can be used to perform some sort of BLAT. Am I on the right > lines here? > > Can anyone add to what packages I should be looking at and what > approaches or techniques I should be using. > The Bio-sig-seq list is, perhaps the best place to ask for more details. The shortreads package combined with the Biostrings package (for doing alignments) is one possibility. Also, it is possible to do the alignments outside of R using algorithms like Bowtie, MAQ, or ELAND and the shortreads package can read those results directly. Sean [[alternative HTML version deleted]]
ADD COMMENTlink written 10.0 years ago by Sean Davis21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 263 users visited in the last hour