Question

Fold Change and Technical Details of DESeq2

1

Entering edit mode

www1124 ▴ 10

@www1124-24411

Last seen 4.5 years ago

At some point DESeq2 was updated so that LFC shrinkage was no longer performed automatically, and that functionality was moved into a separate function. In the original paper, the section on the estimation of the LFC states that the final LFC estimates are given by the MAP for the negative binomial GLM with the shrinkage prior. I just wanted to verify a few things.

LFC estimation is still carried out be the maximizing the likelihood, but now it is not performed with the shrinkage prior.
In the case of where say a condition A has exactly 0 counts in gene G, and condition B has non-zero counts in gene G, the estimate of the LFC still makes sense because under a NB distribution, 0 counts have a definable probability.
The final estimates of the LFC appear to depend on the fitted dispersion values. So, as a hypothetical, suppose we had an RNA-seq dataset, and I decide that there is a blacklist of genes with clearly non-zero values in some samples and they have very high variance, and I decide to remove them prior to running DESeq2. I think the estimated dispersions would be underestimated in this case, and this would affect the resulting log2FC estimates. Is that correct?
In the case of sc-RNAseq, I have seen people use normalized count cutoffs sometimes in their studies after DESeq2. All the information contained in the assay were already used in the estimation procedures, and so using a cutoff afterwards seems okay. The estimates shouldn't be biased. However, I am assuming that one should not remove genes prior to running DESeq2. Is that correct?

deseq2 • 1.1k views

ADD COMMENT • link updated 4.5 years ago by Michael Love 43k • written 4.5 years ago by www1124 ▴ 10

score 1 · Answer 1 · 2020-10-20

Yes, about the update. We do have a section that discusses such updates.

Correct
the MLE LFC will tend toward infinite values, but we just halt after it crosses a threshold.
Yes, in a NB GLM, the MLE LFC sometimes will depend on the dispersion. An example is when you control for a batch effect, then the LFC is not simply the ratio of normalized counts in one group compared to the other. Yes, removing some high variance genes may affect the dispersion (although actually both the fitting of the trend and the estimation of the prior width in DESeq2 include strategies to avoid the influence of high variance outliers).
In general, it's best to include all genes in DESeq2 to allow it to best estimate the priors. It is fine to remove genes that have very low counts, this is totally compatible with the procedures we use.