Hi Michael / DESeq2 developers,
I’m trying to make sense of a comparison of DESeq v1.14 and DESeq v1.22 on the identical data set / condition. The results I found between versions with shrinkage “on” are qualitatively different - 14 hit genes from v1.14, 75 hit genes from v1.22, so I have two questions (see Question1 / Question2 below). All the files, code and results / pictures I refer to below are here: https://drive.google.com/drive/folders/1w1JipKziW0IPhs2RjEe7HSfuF9DUkP7H?usp=sharing
I’ve been running an old version of DESeq2 (v1.14) on my laptop for the last couple of years. I have analysed a data set with it following the default practice (shrinkage “on”, ie betaPrior=TRUE) in the old vignette. I also rank with shrinkage off (ie betaPrior = FALSE). Following the old vignette, the code for both cases looked like this:
packageVersion("DESeq2") #  ‘1.14.1’ deseqds<-DESeqDataSetFromMatrix(countData=test_counts, colData=test_conditions, design= ~ condition) dds_no_shrinkage <- DESeq(deseqds, betaPrior = FALSE) results_no_shrinkage <- as.data.frame(results(dds_no_shrinkage, contrast = c("condition","treatment","control"))) dds_shrinkage <- DESeq(deseqds) results_shrinkage <- as.data.frame(results(dds_shrinkage, contrast = c("condition","treatment","control")))
The results are in “deseq1.14resultsnoshrinkage.txt” and “deseq1.14results_shrinkage.txt"
I then installed a new version of DESeq2 (v.1.22) inside a new docker (rocker) image, and re-analysed the same dataset with the same conditions matrix. For the new version, I switched ‘on’ shrinkage by running the lfcShrink() following the current vignette and using the ’normal’ shrinkage method. I also used the unshrunk results (ie without the lfcShrink function running). This code looked like this:
packageVersion("DESeq2") #  ‘1.22.1’ deseqds<-DESeqDataSetFromMatrix(countData=test_counts, colData=test_conditions, design= ~ condition) dds_before_shinkage <- DESeq(deseqds) results_no_shrinkage <- as.data.frame(results(dds_before_shrinkage, contrast = c("condition","treatment","control")) results_shrinkage <- lfcShrink(dds_before_shinkage, coef="condition_treatment_vs_control",type="normal") results_shrinkage_df <- as.data.frame(results_shrinkage)
The results are in “deseq1.22resultsnoshrinkage.txt” and “deseq1.22results_shrinkage.txt"
I was then able to make comparisons between the results of DESeq2 versions, with and without shrinkage. Here is what I saw:
No shrinkage: deseq2 v1.14 without shrinkage VS v.21 WITHOUT shrinkage. The ranking and padj’s of the genes from deseq2 v1.14 are identical to the genes from deseq2 v1.22. See picture “noshrinkagedeseq1.14deseq1.22rank_comparison.jpg"
With Shrinkage: deseq2 v1.14 with shrinkage VS v1.21 WITH shrinkage. The ranking and padj’s of the genes from deseq v1.14 are significantly changed: in my test dataset for deseq2 v1.22 there are 75 genes with padj < 0.1, compared to 14 genes with padj < 0.1 from deseq2 v 1.14. When comparing the ranks of the top-hit genes there doesn’t appear to be a pattern as to which genes are omitted and which are kept between the two versions. See picture “shrinkagedeseq1.14deseq1.22hitcomparison.jpg”.
So the questions I have are:
Question1. The old default method (with shrinkage) appears to produce far fewer hit genes and a qualitatively different result compared to the new method with shrinkage. What should I make of this? People have made decisions about validation, emphases on papers etc based on these results over the years. If the list of hits has increased like this, then possibly other decisions could have been taken. Have others noticed this kind of behaviour, and what is their response?
Question2. The new lfcShrink method (I’m running v1.22) appears to change the LFC’s without changing the pvalues / padj values produced by DESeq2. This is in contrast to the old DESeq(… betaPrior=FALSE/TRUE) method of switching off shrinkage, which appears to change BOTH LFCs and p-values. Is this expected?
I’m happy to provide/upload any more detail that I’ve inadvertently missed.