Solexa fastq files and BLAT
2
0
Entering edit mode
Daniel Brewer ★ 1.9k
@daniel-brewer-1791
Last seen 9.6 years ago
Hi, I have got hold of some solexa results in fastq format from some cancer mRNA samples and I would like to analyse these to look at a number of things. I would like to be able to list the sequences that occur most often and then BLAT them against the human genome to see whether that sequence does occur and if so, is associated with a known transcript. Further down the line I would like to do some comparisons between normal and tumour tissue. >From looking around it seems that Shortread (in the development version) can be used to read in the files into BioStrings objects and then BSgenome can be used to perform some sort of BLAT. Am I on the right lines here? Can anyone add to what packages I should be looking at and what approaches or techniques I should be using. Thanks Dan -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Molecular Carcinogenesis Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}
Cancer Biostrings ShortRead Cancer Biostrings ShortRead • 1.7k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States
Hi Daniel -- "Sean Davis" <sdavis2 at="" mail.nih.gov=""> writes: > On Tue, Jan 13, 2009 at 6:35 AM, Daniel Brewer <daniel.brewer at="" icr.ac.uk="">wrote: > >> Hi, >> >> I have got hold of some solexa results in fastq format from some cancer >> mRNA samples and I would like to analyse these to look at a number of >> things. I would like to be able to list the sequences that occur most >> often and then BLAT them against the human genome to see whether that 'readFastq' and the 'top' component of the return value of 'tables' in ShortRead would allow you to read in the fastq file(s) and tabulate the most common occurences. If these are relatively raw reads, you'll likely be disappointed to find that the most common are Solexa adapter sequences and other artifacts. Also if the sample prep involved a PCR step then likely you'll see PCR artifacts (e.g., differential amplification). If you have access to the _export.txt files then 'qa' and 'report' can provide a useful overview of your data and its limitations; the relatively high-level code used in generating the report, visible in the file at system.file("template", "qa_solexa.Rnw", package="ShortRead") might be suggestive of ways to explore your data (view in a text browser, look for Sweave 'chunks' between <<>>= and @). Also the various vignettes. >> sequence does occur and if so, is associated with a known transcript. >> Further down the line I would like to do some comparisons between normal >> and tumour tissue. >> >> >From looking around it seems that Shortread (in the development version) >> can be used to read in the files into BioStrings objects and then >> BSgenome can be used to perform some sort of BLAT. Am I on the right >> lines here? >> >> Can anyone add to what packages I should be looking at and what >> approaches or techniques I should be using. The IRanges package provides very useful tools, at a slightly more abstract level (current favorites include the Rle-class, which is returned for instance by the 'coverage' function, and the manipulation of IRanges themselves). The rtracklayer package provides a way to expose results as tracks in genome browsers. Biobase::matchpt and the org.* packages can be useful, too. > The Bio-sig-seq list is, perhaps the best place to ask for more details. Yes it would be good to follow up to the bioc-sig-sequencing group. See http://bioconductor.org/docs/mailList.html Martin > The shortreads package combined with the Biostrings package (for doing > alignments) is one possibility. Also, it is possible to do the alignments > outside of R using algorithms like Bowtie, MAQ, or ELAND and the shortreads > package can read those results directly. > > Sean > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 12 weeks ago
United States
On Tue, Jan 13, 2009 at 6:35 AM, Daniel Brewer <daniel.brewer@icr.ac.uk>wrote: > Hi, > > I have got hold of some solexa results in fastq format from some cancer > mRNA samples and I would like to analyse these to look at a number of > things. I would like to be able to list the sequences that occur most > often and then BLAT them against the human genome to see whether that > sequence does occur and if so, is associated with a known transcript. > Further down the line I would like to do some comparisons between normal > and tumour tissue. > > >From looking around it seems that Shortread (in the development version) > can be used to read in the files into BioStrings objects and then > BSgenome can be used to perform some sort of BLAT. Am I on the right > lines here? > > Can anyone add to what packages I should be looking at and what > approaches or techniques I should be using. > The Bio-sig-seq list is, perhaps the best place to ask for more details. The shortreads package combined with the Biostrings package (for doing alignments) is one possibility. Also, it is possible to do the alignments outside of R using algorithms like Bowtie, MAQ, or ELAND and the shortreads package can read those results directly. Sean [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 919 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6