Question

Ballgown vs (voom,edgeR,DESeq,limma)

2

Entering edit mode

François Lefebvre ▴ 50

@francois-lefebvre-4696

Last seen 3.3 years ago

Canada

Hi all, How would you use the ballgown package in conjunction with voom, edgeR, DESeq, limma?

The bioarXiv paper seems to be making the claim that ballgown gaps the bridge between cufflinks and tools like Limma, Voom, edgeR, DEseq.

I don’t understand how voom, edgeR and DEseq can by used at the gene or transcript level, since these require raw counts , which ballgown/cufflinks do not return (unlike estimates from RSEM or Sailfish for instance). That is unless one is ready to feed FPKM values to voom(), but that looks incorrect to me.

As for Limma or the ballgown model, one has to accept working on the log2(1+FPKM) scale. But then one wonders why using ballgown in the first place when we can just parse the output of cuffnorm and feed those into Limma.

Thanks!

limma rnaseq differential expression voom • 7.2k views

ADD COMMENT • link updated 9.3 years ago by Alyssa Frazee ▴ 210 • written 9.3 years ago by François Lefebvre ▴ 50

0

Entering edit mode

This is a useful comment on this issue.

http://permalink.gmane.org/gmane.science.biology.informatics.conductor/48283

ADD REPLY • link 8.7 years ago matthew.hindle • 0

score 3 · Answer 1 · 2015-01-23

Hi Francois,

Ballgown objects do contain read counts at the exon level. These are calculated with Tablemaker, the preprocessor we released to parse Cufflinks assemblies. So you can use edgeR, DESeq, voom, or other count-based methods on those counts. They are not meant for transcript-level analysis. If you want gene counts for your Cufflinks assembly, you can use existing gene counting functions (e.g. summarizeOverlaps) with alignments + the "merged.gtf" Cufflinks file.

For transcript-level analysis: you are of course welcome to parse Cuffnorm output, load it into R, and feed it into limma. Running tablemaker and then the "ballgown()" function, then extracting expression measurements with texpr() is basically equivalent to that. There are a few advantages to using ballgown:
(1) The ballgown() function is the parser, so you don't have to write it yourself.
(2) The expression measurements are connected to the assembly structure, in efficient GRanges/GRangesList format.
(3) ballgown provides functions for plotting transcript structure/abundances and matching assembled transcripts to annotation.
(4) The linear modeling in ballgown (stattest() function) specifies the models to compare, does library size normalization, and adjusts p-values for multiple testing correction by default. This is all totally possible with limma, of course, but we have wrapped this into one function call. The idea behind stattest() was to provide a drop-in replacement for Cuffdiff, whose users don't need to specify models, normalization, etc. We show in our preprint that using these models (log FPKM values fed to ballgown/limma) can accurately detect differential transcript expression.

Ballgown's contribution is the software infrastructure connecting Cufflinks assemblies to R (so users don't have to write their own parsers; the paper also shows that linear modeling of transcript FPKM gives appropriate DE results).

Hope this helps!