Entering edit mode
A professor sent me a bunch of raw RNA-seq reads (as FASTQ files) and
I want
to align them, and I couldn't really make heads or tails of the
options, so
I listened to what Phil Green told me at a conference and looked
around for
a sensible word-nucleated aligner like he described. It seems that
Rsubread
works this way?
http://sourceforge.net/projects/subread/
I would like BAM files as intermediate output, but my real interest is
differential exon usage in differentiating cells. Given that the
reads I
have to align are relatively short (36bp, SE), is there an advantage
or
disadvantage in using subread compared to other options? And when I'm
done
trimming and aligning, I could choose raw counts, conditional quantile
normalized counts, or something like RPKM to summarize how often a
given
exon seems to have been transcribed. I read this:
http://seqanswers.com/forums/showthread.php?t=586
and I see that packages using a Gamma prior for the dispersion of a
Poisson
count model benefit from having raw counts. If I am after correlated
changes in exon usage depending on other sequence features, is it
reasonable
to use (say) 'cqn' on the raw counts, then log-transform and work with
those
normalized counts?
Thanks for any suggestions,
--
Tim Triche, Jr.
USC Biostatistics
[[alternative HTML version deleted]]