4.8 years ago by
San Francisco, CA, USA
Ballgown objects do contain read counts at the exon level. These are calculated with Tablemaker, the preprocessor we released to parse Cufflinks assemblies. So you can use edgeR, DESeq, voom, or other count-based methods on those counts. They are not meant for transcript-level analysis. If you want gene counts for your Cufflinks assembly, you can use existing gene counting functions (e.g. summarizeOverlaps) with alignments + the "merged.gtf" Cufflinks file.
For transcript-level analysis: you are of course welcome to parse Cuffnorm output, load it into R, and feed it into limma. Running tablemaker and then the "ballgown()" function, then extracting expression measurements with texpr() is basically equivalent to that. There are a few advantages to using ballgown:
(1) The ballgown() function is the parser, so you don't have to write it yourself.
(2) The expression measurements are connected to the assembly structure, in efficient GRanges/GRangesList format.
(3) ballgown provides functions for plotting transcript structure/abundances and matching assembled transcripts to annotation.
(4) The linear modeling in ballgown (stattest() function) specifies the models to compare, does library size normalization, and adjusts p-values for multiple testing correction by default. This is all totally possible with limma, of course, but we have wrapped this into one function call. The idea behind stattest() was to provide a drop-in replacement for Cuffdiff, whose users don't need to specify models, normalization, etc. We show in our preprint that using these models (log FPKM values fed to ballgown/limma) can accurately detect differential transcript expression.
Ballgown's contribution is the software infrastructure connecting Cufflinks assemblies to R (so users don't have to write their own parsers; the paper also shows that linear modeling of transcript FPKM gives appropriate DE results).
Hope this helps!