How does DESeq2 handle zero counts in one condition?
3
0
Entering edit mode
@gavrielmatt-7233
Last seen 3.1 years ago
United States

I asked this question on SeqAnswers.

Does anyone know how DESeq2 handles genes that have zero counts in one condition and >0 counts in another?

My results output shows that these genes have a positive log2 fold-change value, but I do not understand how DESeq2 arrives at this number if it is taking the log of a ratio, in which the numerator is divided by zero.

Count data (letters are conditions; numbers are replicates):
 

  A1 A2 B1 B2
Gene 1 0 0 692 1350

 

> dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ condition)
> dds$condition <- factor(dds$condition, levels=c("A","B"))
> dds <- DESeq(dds)
> res<-results(dds,independentFiltering = F)

 

Results table:

 

  baseMean  log2FoldChange  lfcSE        stat  pvalue  padj
Gene1 585.1394 11.78521 1.458913 8.078076 6.579644e-16 3.884571e-15


I know that DESeq1 gave an 'Inf' value in these cases, but how does DESeq2 arrive at a real number value? 

Thanks

deseq2 deseq rna-seq • 9.2k views
ADD COMMENT
8
Entering edit mode
@mikelove
Last seen 5 hours ago
United States

While the maximum likelihood estimate (MLE) of DESeq goes to Inf, the use of a prior distribution on LFCs (log fold changes) in DESeq2 gives us a finite estimate. The way to interpret this is that: zeros might indicate absolute no fragments in samples of A, or more likely that the expected counts of fragments is some positive value below 1. If we were to increase the sequencing depth by 10 or 100, etc., we might start to observe some fragments in A. The prior distribution for LFCs is estimated by looking at the distribution of MLE fold changes observed, including other genes where the sequencing depth is higher, and using this range to give a finite estimate here. (See our paper for full details http://genomebiology.com/2014/15/12/550/abstract .) So the estimate here depends on: the dispersion for this gene, how large the counts are for B, and the distribution of log fold changes for other genes which had finite MLE LFCs.

ADD COMMENT
0
Entering edit mode

"some positive value below 0" You probably meant "above 0" :)

ADD REPLY
1
Entering edit mode

yes. below 1. those positive values below 0 are hard to find.

ADD REPLY
0
Entering edit mode

does the same hold when using betaprior=FALSE? thanks

ADD REPLY
0
Entering edit mode

When betaPrior=FALSE, the MLE LFC goes to infinity, but in practice we don't wait for it to reach infinity, but we stop at a large value, a fold change of exp(30). 

ADD REPLY
0
Entering edit mode

genes with for example all zeros in control condition (the denominator), don't get the same MLE FC, so I guess this depends also on the values of the numerator?

 

ADD REPLY
0
Entering edit mode

It's a stopping rule, I don't then set the values to have a natural log fold change of 30. Also, things are limited from going to Infinity because I have a bound on the expected value for mu, which acts similar to a pseudocount (but it's not a pseudocount approach).

ADD REPLY
0
Entering edit mode
@gavrielmatt-7233
Last seen 3.1 years ago
United States

Thank you for the explanation Michael.

The link you posted is dead, but I assume you mean this paper?

Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 550. doi:10.1186/s13059-014-0550-8

ADD COMMENT
0
Entering edit mode
@gavrielmatt-7233
Last seen 3.1 years ago
United States

Ah. It's because you included the period and parantheses in the hyperlink.

ADD COMMENT

Login before adding your answer.

Traffic: 333 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6