A recent paper in the journal RNA by Schurch et al had reported high false positives for null comparisons of yeast RNA-seq samples using DESeq2. This turned out to be a small mistake (with big consequences) in how the results for DESeq2 were being recorded. Filtered genes with low count, e.g. < 1 read count for samples, were recorded as highly significant, e.g. p-value = 0.
The authors have been very responsive and quick in helping to correct the paper and figures. Furthermore, I could easily try to replicate their analysis using their publicly available processed data (htseq-count files) and could look over their analysis code.
A corrected version of the paper will appear in the journal May 16. In the meantime, the senior author has written up what happened in a blog post, including the new figures, which show that DESeq2 performs well in their analysis, comparable to other top methods.
I have to say, this is a very compelling dataset for testing the performance of statistical methods, because a comparison of a 3 vs 3 experiment using the remaining 39 vs 39 samples as gold standard is ideal for judging whether FDR sets stay within their nominal bounds. I'm planning on wrapping the dataset up as a Bioconductor package (or ExperimentHub instance?) for use in courses and workshops.
Updated figures:
These new figures also help to answer, why to use DESeq2 over DESeq. For 3 vs 3 and 6 vs 6 comparisons, DESeq2 has much higher sensitivity. DESeq2 has ~50% higher sensitivity compared to DESeq for the 3 vs 3 comparison, and ~25% higher for the 6 vs 6, while controlling FDR around nominal levels (as we and independent groups have demonstrated).
Mike, you might want to edit this to change the post type to something other than "question".
I chose "Other" when I posted, not sure how to change.
Weird, I just tried editing it, and you're right, setting it to "Other" just leaves it as a question.