Question

How does DESeq2 handle zero counts in one condition?

3

Entering edit mode

gavrielmatt ▴ 30

@gavrielmatt-7233

Last seen 5.7 years ago

United States

I asked this question on SeqAnswers.

Does anyone know how DESeq2 handles genes that have zero counts in one condition and >0 counts in another?

My results output shows that these genes have a positive log2 fold-change value, but I do not understand how DESeq2 arrives at this number if it is taking the log of a ratio, in which the numerator is divided by zero.

Count data (letters are conditions; numbers are replicates):

	A1	A2	B1	B2
Gene 1	0	0	692	1350

> dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ condition)
> dds$condition <- factor(dds$condition, levels=c("A","B"))
> dds <- DESeq(dds)
> res<-results(dds,independentFiltering = F)

Results table:

	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
Gene1	585.1394	11.78521	1.458913	8.078076	6.579644e-16	3.884571e-15

I know that DESeq1 gave an 'Inf' value in these cases, but how does DESeq2 arrive at a real number value?

Thanks

deseq2 deseq rna-seq • 15k views

ADD COMMENT • link 9.3 years ago gavrielmatt ▴ 30

score 10 · Answer 1 · 2015-01-09

10

Entering edit mode

Michael Love 41k

@mikelove

Last seen 36 minutes ago

United States

While the maximum likelihood estimate (MLE) of DESeq goes to Inf, the use of a prior distribution on LFCs (log fold changes) in DESeq2 gives us a finite estimate. The way to interpret this is that: zeros might indicate absolute no fragments in samples of A, or more likely that the expected counts of fragments is some positive value below 1. If we were to increase the sequencing depth by 10 or 100, etc., we might start to observe some fragments in A. The prior distribution for LFCs is estimated by looking at the distribution of MLE fold changes observed, including other genes where the sequencing depth is higher, and using this range to give a finite estimate here. (See our paper for full details http://genomebiology.com/2014/15/12/550/abstract .) So the estimate here depends on: the dispersion for this gene, how large the counts are for B, and the distribution of log fold changes for other genes which had finite MLE LFCs.

ADD COMMENT • link 9.3 years ago Michael Love 41k

0

Entering edit mode

"some positive value below 0" You probably meant "above 0" :)

ADD REPLY • link 9.3 years ago Peter Langfelder ★ 3.0k

1

Entering edit mode

yes. below 1. those positive values below 0 are hard to find.

ADD REPLY • link 9.3 years ago Michael Love 41k

0

Entering edit mode

does the same hold when using betaprior=FALSE? thanks

ADD REPLY • link 8.0 years ago deut13 ▴ 80

0

Entering edit mode

When betaPrior=FALSE, the MLE LFC goes to infinity, but in practice we don't wait for it to reach infinity, but we stop at a large value, a fold change of exp(30).

ADD REPLY • link 8.0 years ago Michael Love 41k

0

Entering edit mode

genes with for example all zeros in control condition (the denominator), don't get the same MLE FC, so I guess this depends also on the values of the numerator?

ADD REPLY • link 8.0 years ago deut13 ▴ 80

0

Entering edit mode

It's a stopping rule, I don't then set the values to have a natural log fold change of 30. Also, things are limited from going to Infinity because I have a bound on the expected value for mu, which acts similar to a pseudocount (but it's not a pseudocount approach).

ADD REPLY • link 8.0 years ago Michael Love 41k

score 0 · Answer 2 · 2015-01-09

0

Entering edit mode

gavrielmatt ▴ 30

@gavrielmatt-7233

Last seen 5.7 years ago

United States

Thank you for the explanation Michael.

The link you posted is dead, but I assume you mean this paper?

Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 550. doi:10.1186/s13059-014-0550-8

ADD COMMENT • link 9.3 years ago gavrielmatt ▴ 30

score 0 · Answer 3 · 2015-01-09

0

Entering edit mode

gavrielmatt ▴ 30

@gavrielmatt-7233

Last seen 5.7 years ago

United States

Ah. It's because you included the period and parantheses in the hyperlink.

ADD COMMENT • link 9.3 years ago gavrielmatt ▴ 30