Question: DESeq2 1.16.1 : Zero count values re-occurrence of issue 56191?
gravatar for greg.deakin
8 months ago by
greg.deakin0 wrote:

Hello - I have an issue with genes (or OTUs in my case) which have zero expression across a condition.

Seems to be the same issue as

A: DESeq2; FoldChange values with only 0's

Works as expected (well as I'd expect) in DESeq2 1.10.1

Below are a few of the details

model = ~ Block + Treatment

contrast = c("Treatment","D","Control")

Treatment is a factor with 5 levels (Block has multiple levels, but I suspect that's irrelevant anyway)

Example of dodgy "gene"


       Group.1   XXX
1      Control   0.0000
2       A           101.9302
3       B           130.5273
4       C           240.4031
5       D            0.0000


DESeq 1.16.1 (R 3.4) 


log2 fold change (MAP): Treatment D vs Control
Wald test p-value: Treatment D vs Control
DataFrame with 1 row and 6 columns


     baseMean log2FoldChange     lfcSE      stat       pvalue         padj
       <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric>
XXX 3.152404       -20.7867  4.592323 -4.526401 5.999657e-06 0.0003431804


I get the same results if I set betaPrior to True


DESeq2 1.10.1 (R 3.2)


log2 fold change (MAP): Treatment D vs Control
Wald test p-value: Treatment ND vs Control
DataFrame with 1 row and 6 columns

        baseMean log2FoldChange     lfcSE      stat    pvalue      padj
       <numeric>      <numeric> <numeric> <numeric> <numeric> <numeric>
XXX  3.152404              0    1.3954         0         1         1



ADD COMMENTlink modified 8 months ago by Michael Love19k • written 8 months ago by greg.deakin0
gravatar for Michael Love
8 months ago by
Michael Love19k
United States
Michael Love19k wrote:

hi Greg,

I have this change documented in the NEWS file:


    o Remove some code that would "zero out" LFCs
      when both groups involved in a contrast had zero counts.
      This lead to inconsistency when similarly contrasts
      were performed by refactoring.

It presented more of a problem than it was worth, so I took out this code. The problems that we do see, occur when the data isn't well approximated by a NB, often it's not RNA-seq, and the Wald is more sensitive to this non-NB distribution than the LRT. I would recommend to use likelihood ratio tests, where you can provide a full matrix which is:

full <- model.matrix(design(dds), colData(dds))

And reduced can be constructed by removing a column (e.g. the D vs control difference).

You can do:

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)


dds <- nbinomLRT(dds, full=full, reduced=reduced)
res <- results(dds)

I've seen this to provide robustness to e.g. large count outliers that are inconsistent with the NB model, and give expected results.

ADD COMMENTlink written 8 months ago by Michael Love19k

Ah, o.k. thanks Mike.

I guess it's pretty much an edge case, especially in RNA-seq, and not too difficult for me to correct for.

I'll have a look at using LRT. As you guess my data (mostly soil microbiom) might violate a few of the expectations of the DESeq model - it's a bit of an open question how to deal with data consisting mostly of zeros. Having said that, we've had pretty good results using the DESeq Wald test previously, but I can't remember having more than a two factor condition before.

Thanks for your help.


ADD REPLYlink written 8 months ago by greg.deakin0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 297 users visited in the last hour