I am seeking advice on what LFC shrinkage estimator to use when analysing spike-in normalised RNA-seq data using DESeq2.
We are spiking in Drosophila cells for internal normalisation of our RNA-seq experiments and then using read counts in Drosophila genes to calculate sizeFactors. We then supply these sizeFactors into the DESeq2 analysis of mouse gene expression, similar to what has been described in Taruttis et al., 2017 (https://pubmed.ncbi.nlm.nih.gov/28193148/). The spike-in normalisation allows us to measure global changes in gene expression, i.e when the expression of the majority of genes is affected. Since I have recently updated to DESeq2_1.26.0 where apeglm is now a default LFC shrinkage estimator, I wanted to compare the results of DESeq2 analysis for spike-in normalised RNA-seq data with normal and apeglm LFC shrinking.
As explained in the newest DESeq2 manual, I do see that in general apeglm is much better in preserving large LFC values comparing to normal (Figure A, although in this particular case the difference between the two shrinkage approaches is not that big). However, I also noticed that apeglm much more aggressively shrinks LFC values towards 0 (Figure B), which in case of the spike-in normalised RNA-seq data leads to a reduction in the detected global effect on gene expression and a slightly odd bimodal distribution of LFC values (Figures C and D). In addition, when comparing to non-shrunk LFC values, normal LFC shrinkage results in LFC values which correlate slightly better and show overall a more similar distribution to raw LFC values than apeglm-shrunk LFC (Figures B and C). Given these observations, it appears that in the case of spike-in normalised RNA-seq, apeglm potentially distorts the distribution of LFC values due to a very stringent shrinking of LFC values for genes with low expression or high variance towards 0. This makes me think that maybe in this case it is better to stick with the older normal shrinkage approach.
I would really appreciate if people could share their experience of using apeglm or normal shrinkage for spike-in normalised data analysis or any advice from the statistics guru on whether the choice of normal shrinkage is justified when analysing global gene expression changes using spike-in normalised RNA-seq.
Thank you very much in advance!