Hi Daniel --
"Sean Davis" <sdavis2 at="" mail.nih.gov=""> writes:
> On Tue, Jan 13, 2009 at 6:35 AM, Daniel Brewer <daniel.brewer at="" icr.ac.uk="">wrote:
>> I have got hold of some solexa results in fastq format from some
>> mRNA samples and I would like to analyse these to look at a number
>> things. I would like to be able to list the sequences that occur
>> often and then BLAT them against the human genome to see whether
'readFastq' and the 'top' component of the return value of 'tables' in
ShortRead would allow you to read in the fastq file(s) and tabulate
the most common occurences. If these are relatively raw reads, you'll
likely be disappointed to find that the most common are Solexa adapter
sequences and other artifacts. Also if the sample prep involved a PCR
step then likely you'll see PCR artifacts (e.g., differential
amplification). If you have access to the _export.txt files then 'qa'
and 'report' can provide a useful overview of your data and its
limitations; the relatively high-level code used in generating the
report, visible in the file at
system.file("template", "qa_solexa.Rnw", package="ShortRead")
might be suggestive of ways to explore your data (view in a text
browser, look for Sweave 'chunks' between <<>>= and @). Also the
>> sequence does occur and if so, is associated with a known
>> Further down the line I would like to do some comparisons between
>> and tumour tissue.
>> >From looking around it seems that Shortread (in the development
>> can be used to read in the files into BioStrings objects and then
>> BSgenome can be used to perform some sort of BLAT. Am I on the
>> lines here?
>> Can anyone add to what packages I should be looking at and what
>> approaches or techniques I should be using.
The IRanges package provides very useful tools, at a slightly more
abstract level (current favorites include the Rle-class, which is
returned for instance by the 'coverage' function, and the manipulation
of IRanges themselves). The rtracklayer package provides a way to
expose results as tracks in genome browsers. Biobase::matchpt and the
org.* packages can be useful, too.
> The Bio-sig-seq list is, perhaps the best place to ask for more
Yes it would be good to follow up to the bioc-sig-sequencing
group. See http://bioconductor.org/docs/mailList.html
> The shortreads package combined with the Biostrings package (for
> alignments) is one possibility. Also, it is possible to do the
> outside of R using algorithms like Bowtie, MAQ, or ELAND and the
> package can read those results directly.
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives:
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793