Yes, Subread performs read mapping by mapping a set of 16 mers
from each read to the genome, counting the number of mapped 16 mers at
each candidate location and then choosing the one which is mapped to
the majority of the 16 mers as the mapping location of the read. The
fundamental difference between Subread and other aligners is that it
a voting method rather than a extension method to determine the
locations of the reads, which makes it a lot faster and more
Subread has a both a C version and an R version. The C version is
available from sourceforge and the R version is included in the
package in Bioc.
The Rsubread package also includes a function called featureCounts,
can be used to count the number of reads for each exon or gene. So
function will be useful for you to look at the differential expression
both gene level and exon level.
Another function which might be useful for your data analysis is the
subjunc funtion, which is designed to discover exon junctions. Subjunc
uses an idea similar to that of Subread. Our preliminary results
that subjunc outperformed competing junction detectors in terms of
sensitivity and accuracy.
The devel version of Rsubread package includes a lot of our recent
development for both Subread aligner and Subjunc junction detector, so
would recommend using the devel version if you want to try the
Hope this helps.
> A professor sent me a bunch of raw RNA-seq reads (as FASTQ files)
> to align them, and I couldn't really make heads or tails of the
> I listened to what Phil Green told me at a conference and looked
> a sensible word-nucleated aligner like he described. It seems that
> works this way?
> I would like BAM files as intermediate output, but my real interest
> differential exon usage in differentiating cells. Given that the
> have to align are relatively short (36bp, SE), is there an advantage
> disadvantage in using subread compared to other options? And when
> trimming and aligning, I could choose raw counts, conditional
> normalized counts, or something like RPKM to summarize how often a
> exon seems to have been transcribed. I read this:
> and I see that packages using a Gamma prior for the dispersion of a
> count model benefit from having raw counts. If I am after
> changes in exon usage depending on other sequence features, is it
> to use (say) 'cqn' on the raw counts, then log-transform and work
> normalized counts?
> Thanks for any suggestions,
> Tim Triche, Jr.
> USC Biostatistics
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives:
The information in this email is confidential and