Question: DESeq2 1.16.1 : Zero count values re-occurrence of issue 56191?
gravatar for greg.deakin
10 weeks ago by
greg.deakin0 wrote:

Hello - I have an issue with genes (or OTUs in my case) which have zero expression across a condition.

Seems to be the same issue as

A: DESeq2; FoldChange values with only 0's

Works as expected (well as I'd expect) in DESeq2 1.10.1

Below are a few of the details

model = ~ Block + Treatment

contrast = c("Treatment","D","Control")

Treatment is a factor with 5 levels (Block has multiple levels, but I suspect that's irrelevant anyway)

Example of dodgy "gene"


       Group.1   XXX
1      Control   0.0000
2       A           101.9302
3       B           130.5273
4       C           240.4031
5       D            0.0000


DESeq 1.16.1 (R 3.4) 


log2 fold change (MAP): Treatment D vs Control
Wald test p-value: Treatment D vs Control
DataFrame with 1 row and 6 columns


     baseMean log2FoldChange     lfcSE      stat       pvalue         padj
       <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric>
XXX 3.152404       -20.7867  4.592323 -4.526401 5.999657e-06 0.0003431804


I get the same results if I set betaPrior to True


DESeq2 1.10.1 (R 3.2)


log2 fold change (MAP): Treatment D vs Control
Wald test p-value: Treatment ND vs Control
DataFrame with 1 row and 6 columns

        baseMean log2FoldChange     lfcSE      stat    pvalue      padj
       <numeric>      <numeric> <numeric> <numeric> <numeric> <numeric>
XXX  3.152404              0    1.3954         0         1         1



ADD COMMENTlink modified 10 weeks ago by Michael Love16k • written 10 weeks ago by greg.deakin0
gravatar for Michael Love
10 weeks ago by
Michael Love16k
United States
Michael Love16k wrote:

hi Greg,

I have this change documented in the NEWS file:


    o Remove some code that would "zero out" LFCs
      when both groups involved in a contrast had zero counts.
      This lead to inconsistency when similarly contrasts
      were performed by refactoring.

It presented more of a problem than it was worth, so I took out this code. The problems that we do see, occur when the data isn't well approximated by a NB, often it's not RNA-seq, and the Wald is more sensitive to this non-NB distribution than the LRT. I would recommend to use likelihood ratio tests, where you can provide a full matrix which is:

full <- model.matrix(design(dds), colData(dds))

And reduced can be constructed by removing a column (e.g. the D vs control difference).

You can do:

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)


dds <- nbinomLRT(dds, full=full, reduced=reduced)
res <- results(dds)

I've seen this to provide robustness to e.g. large count outliers that are inconsistent with the NB model, and give expected results.

ADD COMMENTlink written 10 weeks ago by Michael Love16k

Ah, o.k. thanks Mike.

I guess it's pretty much an edge case, especially in RNA-seq, and not too difficult for me to correct for.

I'll have a look at using LRT. As you guess my data (mostly soil microbiom) might violate a few of the expectations of the DESeq model - it's a bit of an open question how to deal with data consisting mostly of zeros. Having said that, we've had pretty good results using the DESeq Wald test previously, but I can't remember having more than a two factor condition before.

Thanks for your help.


ADD REPLYlink written 9 weeks ago by greg.deakin0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 227 users visited in the last hour