#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: How does DESeq2 handle zero counts in one condition?
0
4.1 years ago by
United States
gavrielmatt0 wrote:

Does anyone know how DESeq2 handles genes that have zero counts in one condition and >0 counts in another?

My results output shows that these genes have a positive log2 fold-change value, but I do not understand how DESeq2 arrives at this number if it is taking the log of a ratio, in which the numerator is divided by zero.

Count data (letters are conditions; numbers are replicates):

 A1 A2 B1 B2 Gene 1 0 0 692 1350

> dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ condition)
> dds$condition <- factor(dds$condition, levels=c("A","B"))
> dds <- DESeq(dds)
> res<-results(dds,independentFiltering = F)

Results table:

 baseMean log2FoldChange lfcSE stat pvalue padj Gene1 585.1394 11.78521 1.458913 8.078076 6.579644e-16 3.884571e-15

I know that DESeq1 gave an 'Inf' value in these cases, but how does DESeq2 arrive at a real number value?

Thanks

deseq deseq2 rna-seq • 5.6k views
modified 4.1 years ago • written 4.1 years ago by gavrielmatt0
Answer: How does DESeq2 handle zero counts in one condition?
5
4.1 years ago by
Michael Love22k
United States
Michael Love22k wrote:

While the maximum likelihood estimate (MLE) of DESeq goes to Inf, the use of a prior distribution on LFCs (log fold changes) in DESeq2 gives us a finite estimate. The way to interpret this is that: zeros might indicate absolute no fragments in samples of A, or more likely that the expected counts of fragments is some positive value below 1. If we were to increase the sequencing depth by 10 or 100, etc., we might start to observe some fragments in A. The prior distribution for LFCs is estimated by looking at the distribution of MLE fold changes observed, including other genes where the sequencing depth is higher, and using this range to give a finite estimate here. (See our paper for full details http://genomebiology.com/2014/15/12/550/abstract .) So the estimate here depends on: the dispersion for this gene, how large the counts are for B, and the distribution of log fold changes for other genes which had finite MLE LFCs.

"some positive value below 0" You probably meant "above 0" :)

yes. below 1. those positive values below 0 are hard to find.

does the same hold when using betaprior=FALSE? thanks

When betaPrior=FALSE, the MLE LFC goes to infinity, but in practice we don't wait for it to reach infinity, but we stop at a large value, a fold change of exp(30).

genes with for example all zeros in control condition (the denominator), don't get the same MLE FC, so I guess this depends also on the values of the numerator?

It's a stopping rule, I don't then set the values to have a natural log fold change of 30. Also, things are limited from going to Infinity because I have a bound on the expected value for mu, which acts similar to a pseudocount (but it's not a pseudocount approach).

Answer: How does DESeq2 handle zero counts in one condition?
0
4.1 years ago by
United States
gavrielmatt0 wrote:

Thank you for the explanation Michael.

The link you posted is dead, but I assume you mean this paper?

Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 550. doi:10.1186/s13059-014-0550-8

Answer: How does DESeq2 handle zero counts in one condition?
0
4.1 years ago by
United States
gavrielmatt0 wrote:

Ah. It's because you included the period and parantheses in the hyperlink.