How does Deseq2 deal with the row with zero when calculating size factors
1
0
Entering edit mode
jiakang • 0
@7dbff853
Last seen 11 months ago
United States

Hi, I am confused about the calculation of the size factors when the gene has at least one zero count in the samples when using the default params. Though I have checked the source code of deseq2, I did not get the point. Because I found that if there is at least one gene without zero among all of the samples, the deseq2 runs smoothly. When I changed the sfType = "poscounts", which I think is proper for the data that have a lot of zero, the size factors are similar to the default param (sfType = "ratio", right?). In theory, the genes with at least one zero among samples, deseq2 will ignore them for the calculation of size factors.

So my ultimate question is, Can you be kind to explain why the size factors results of sfType = "poscounts" and the default param are similar when there are a lot of genes with at least one zero among samples?

DESeq2 • 1.2k views
ADD COMMENT
2
Entering edit mode
ATpoint ★ 4.5k
@atpoint-13662
Last seen 28 minutes ago
Germany

Because I found that if there is at least one gene without zero among all of the samples, the deseq2 runs smoothly.

No, it doesn't.

library(DESeq2)

set.seed(1)
dds <- makeExampleDESeqDataSet()
assay(dds, "counts")[,1] <- 0

# will error
estimateSizeFactors(dds)

https://github.com/thelovelab/DESeq2/blob/devel/R/core.R#L544

This line is the key. First, raw counts are transformed by natural log. Hence, genes with at least one zero will have at least one infinite value, because log of zero is infinite. That will make the mean of the logcounts infinite as well:

https://github.com/thelovelab/DESeq2/blob/devel/R/core.R#L546

...and this triggers the error:

https://github.com/thelovelab/DESeq2/blob/devel/R/core.R#L557

Not much magic in there. If it runs for you then because not every gene has at least one zero.

ADD COMMENT
0
Entering edit mode

Thanks for your quick reply. I understand and have noticed the code you mentioned above.

the size factors results of sfType = "poscounts" and the default param are similar when there are a lot of genes with at least one zero among samples

later I found that this is true for just some project. I think I know the reason, if there are only very little rows that have no zero among samples, the size factors should be biased.

ADD REPLY

Login before adding your answer.

Traffic: 991 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6