Updated DESeq2 performance on highly replicated yeast RNA-seq data
1
6
Entering edit mode
@mikelove
Last seen 15 hours ago
United States

A recent paper in the journal RNA by Schurch et al had reported high false positives for null comparisons of yeast RNA-seq samples using DESeq2. This turned out to be a small mistake (with big consequences) in how the results for DESeq2 were being recorded. Filtered genes with low count, e.g. < 1 read count for samples, were recorded as highly significant, e.g. p-value = 0.

The authors have been very responsive and quick in helping to correct the paper and figures. Furthermore, I could easily try to replicate their analysis using their publicly available processed data (htseq-count files) and could look over their analysis code.

A corrected version of the paper will appear in the journal May 16. In the meantime, the senior author has written up what happened in a blog post, including the new figures, which show that DESeq2 performs well in their analysis, comparable to other top methods.

I have to say, this is a very compelling dataset for testing the performance of statistical methods, because a comparison of a 3 vs 3 experiment using the remaining 39 vs 39 samples as gold standard is ideal for judging whether FDR sets stay within their nominal bounds. I'm planning on wrapping the dataset up as a Bioconductor package (or ExperimentHub instance?) for use in courses and workshops.

https://geoffbarton.wordpress.com/2016/04/21/how-many-biological-replicates-are-needed-in-an-rna-seq-experiment-and-which-differential-expression-tool-should-you-use/

Updated figures:

 

 

 

deseq2 • 2.3k views
ADD COMMENT
0
Entering edit mode

These new figures also help to answer, why to use DESeq2 over DESeq. For 3 vs 3 and 6 vs 6 comparisons, DESeq2 has much higher sensitivity. DESeq2 has ~50% higher sensitivity compared to DESeq for the 3 vs 3 comparison, and ~25% higher for the 6 vs 6, while controlling FDR around nominal levels (as we and independent groups have demonstrated).

ADD REPLY
0
Entering edit mode

Mike, you might want to edit this to change the post type to something other than "question".

ADD REPLY
0
Entering edit mode

I chose "Other" when I posted, not sure how to change.

ADD REPLY
0
Entering edit mode

Weird, I just tried editing it, and you're right, setting it to "Other" just leaves it as a question.

ADD REPLY
0
Entering edit mode
phil.chapman ▴ 150
@philchapman-8324
Last seen 8.2 years ago
United Kingdom

Mike,

This is a really useful analysis and dataset, but it is just a single dataset.  How might you anticipate the results differing in a system with more or less biological variability?  The work I do tends to involve treating cancer cell lines with novel compounds, so there is really very little biological variability in the system and you can safely get away with low numbers of replicates (n=3/4).  If the samples were from drug treated animal models (or patients), however, there would be much more variability so I would want bigger groups (n=6-10). 

Thanks

ADD COMMENT
1
Entering edit mode

As the dispersion approaches zero, I would expect most of these methods (i.e. all the methods based on the negative binomial distribution) to converge on the performance of a Poisson test.

ADD REPLY
1
Entering edit mode

Things will look different for more or less biological variability. For fine grained performance, you could try to come with simulations which mimic real data but have tunable parameters. Anyway, there are a number of papers that evaluate RNA-seq. I mainly posted to show that a correction was issued for this one, and to show that here DESeq2 has higher sensitivity to DESeq.

ADD REPLY

Login before adding your answer.

Traffic: 592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6