Question

How does Deseq2 deal with the row with zero when calculating size factors

0

Entering edit mode

jiakang • 0

@7dbff853

Last seen 19 months ago

United States

Hi, I am confused about the calculation of the size factors when the gene has at least one zero count in the samples when using the default params. Though I have checked the source code of deseq2, I did not get the point. Because I found that if there is at least one gene without zero among all of the samples, the deseq2 runs smoothly. When I changed the sfType = "poscounts", which I think is proper for the data that have a lot of zero, the size factors are similar to the default param (sfType = "ratio", right?). In theory, the genes with at least one zero among samples, deseq2 will ignore them for the calculation of size factors.

So my ultimate question is, Can you be kind to explain why the size factors results of sfType = "poscounts" and the default param are similar when there are a lot of genes with at least one zero among samples?

DESeq2 • 1.7k views

ADD COMMENT • link written 19 months ago by jiakang • 0

score 2 · Answer 1 · 2023-12-04

Because I found that if there is at least one gene without zero among all of the samples, the deseq2 runs smoothly.

No, it doesn't.

library(DESeq2)

set.seed(1)
dds <- makeExampleDESeqDataSet()
assay(dds, "counts")[,1] <- 0

# will error
estimateSizeFactors(dds)

https://github.com/thelovelab/DESeq2/blob/devel/R/core.R#L544

This line is the key. First, raw counts are transformed by natural log. Hence, genes with at least one zero will have at least one infinite value, because log of zero is infinite. That will make the mean of the logcounts infinite as well:

https://github.com/thelovelab/DESeq2/blob/devel/R/core.R#L546

...and this triggers the error:

https://github.com/thelovelab/DESeq2/blob/devel/R/core.R#L557

Not much magic in there. If it runs for you then because not every gene has at least one zero.