Ballgown vs (voom,edgeR,DESeq,limma)
Entering edit mode
Last seen 21 months ago

Hi all, How would you use the ballgown package in conjunction with voom, edgeR, DESeq, limma?

The bioarXiv paper seems to be making the claim that ballgown gaps the bridge between cufflinks and tools like Limma, Voom, edgeR, DEseq.  

I don’t understand how voom, edgeR and DEseq can by used at the gene or transcript level, since these require raw counts , which ballgown/cufflinks do not return (unlike estimates from RSEM or Sailfish for instance). That is unless one is ready to feed FPKM values to voom(), but that looks incorrect to me.

As for Limma or the ballgown model, one has to accept working on the log2(1+FPKM) scale. But then one wonders why using ballgown in the first place when we can just parse the output of cuffnorm and feed those into Limma.



limma rnaseq differential expression voom • 6.5k views
Entering edit mode
Entering edit mode
Alyssa Frazee ▴ 210
Last seen 22 months ago
San Francisco, CA, USA

Hi Francois,

Ballgown objects do contain read counts at the exon level. These are calculated with Tablemaker, the preprocessor we released to parse Cufflinks assemblies. So you can use edgeR, DESeq, voom, or other count-based methods on those counts. They are not meant for transcript-level analysis. If you want gene counts for your Cufflinks assembly, you can use existing gene counting functions (e.g. summarizeOverlaps) with alignments + the "merged.gtf" Cufflinks file. 

For transcript-level analysis: you are of course welcome to parse Cuffnorm output, load it into R, and feed it into limma. Running tablemaker and then the "ballgown()" function, then extracting expression measurements with texpr() is basically equivalent to that. There are a few advantages to using ballgown: 
(1) The ballgown() function is the parser, so you don't have to write it yourself.
(2) The expression measurements are connected to the assembly structure, in efficient GRanges/GRangesList format.
(3) ballgown provides functions for plotting transcript structure/abundances and matching assembled transcripts to annotation.
(4) The linear modeling in ballgown (stattest() function) specifies the models to compare, does library size normalization, and adjusts p-values for multiple testing correction by default. This is all totally possible with limma, of course, but we have wrapped this into one function call. The idea behind stattest() was to provide a drop-in replacement for Cuffdiff, whose users don't need to specify models, normalization, etc. We show in our preprint that using these models (log FPKM values fed to ballgown/limma) can accurately detect differential transcript expression.

Ballgown's contribution is the software infrastructure connecting Cufflinks assemblies to R (so users don't have to write their own parsers; the paper also shows that linear modeling of transcript FPKM gives appropriate DE results). 

Hope this helps!







Login before adding your answer.

Traffic: 305 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6