Another DESeq2 version, another change in the results. I would like to appeal to the DESeq2 authors and maintainers to please keep the results replicable across versions, at least optionally by adding arguments to the relevant functions that allow the user to choose an older way of calculating things. It would be tremendously helpful. In my case DESeq2 results are the input of multiple downstream analyses, some of which need manual intervention. Having to either re-run everything or start keeping track of multiple versions of the results gets tiring very fast.
Thanks!
"when you're in a late stage of manuscript writing, with several downstream analyses, want to add one more analysis"
I understand where you are coming from but I have to disagree and emphasize that the solution is not to change versions of software within a project if you need the p-values to remain exactly the same. There is no way to both develop a statistical software project and deliver equal p-values across each version. The methods remain very stable across versions (I test against a battery of datasets), but there are many estimation procedures going on. It would not be possible to have every small change as an argument or else the documentation would become unreadable.
I can only second what Mike says. As you say, there are good use cases where you need exact reproducibility, but then, record the versions you used, and use these. (And Gabe Becker's switchr package might be handy for automating that.)
The change you are referring to ("Small change to fitted mu values...) seems perfectly reasonable in the evolution of this software, and overall everyone is better off with it. Who would like their new smartphone or new car be exactly backward-compatible with the previous version?
I see your point - but I would still suggest to make the changes less frequent, maybe collect improvements until they are substantial, then add an option to run the code the old way. Regarding keeping the R version fixed, sometimes a version change of R is forced on the user, for example when running the analysis on an organization- (e.g., school-)wide server.
I just have to disagree with you, because this is exactly what I have done, and the package is in fact very stable since 1.4 released in 2014. I know because I run on many test datasets and see +/- single digits of genes.
You simply can't expect the exact same set with different software versions, especially given that p-values are tail probabilities that are highly sensitive to model parameters. Changes can come from outside DESeq2 as well which would change the p-values for all genes, including annotation change, count changes, or other dependencies.
And it would be bad software design to have everything exposed to user as arguments and options.
For the pasilla dataset, I can look at the vignette output online to see how an FDR set has changed over time. Here is a table with the number of adjusted p-values less than 0.1 for the past 4 versions of DESeq2, so going back 2 years to Fall 2014.
This is what I would define as very stable, providing a fixed target for methods comparisons from other groups, while allowing minor improvements to the software and bug fixes. But if one requires the same set of genes for downstream analysis, I would not recommend changing versions.
Managing software versions under a simple module system (environment modules) can be a great help here especially on shared research computing systems. This way you can always choose between current and historical versions whether this is R or most other software.