Entering edit mode
www1124
▴
10
@www1124-24411
Last seen 4.5 years ago
At some point DESeq2 was updated so that LFC shrinkage was no longer performed automatically, and that functionality was moved into a separate function. In the original paper, the section on the estimation of the LFC states that the final LFC estimates are given by the MAP for the negative binomial GLM with the shrinkage prior. I just wanted to verify a few things.
- LFC estimation is still carried out be the maximizing the likelihood, but now it is not performed with the shrinkage prior.
- In the case of where say a condition A has exactly 0 counts in gene G, and condition B has non-zero counts in gene G, the estimate of the LFC still makes sense because under a NB distribution, 0 counts have a definable probability.
- The final estimates of the LFC appear to depend on the fitted dispersion values. So, as a hypothetical, suppose we had an RNA-seq dataset, and I decide that there is a blacklist of genes with clearly non-zero values in some samples and they have very high variance, and I decide to remove them prior to running DESeq2. I think the estimated dispersions would be underestimated in this case, and this would affect the resulting log2FC estimates. Is that correct?
- In the case of sc-RNAseq, I have seen people use normalized count cutoffs sometimes in their studies after DESeq2. All the information contained in the assay were already used in the estimation procedures, and so using a cutoff afterwards seems okay. The estimates shouldn't be biased. However, I am assuming that one should not remove genes prior to running DESeq2. Is that correct?
Thank you very much for the detailed reply!