I asked this question on SeqAnswers.
Does anyone know how DESeq2 handles genes that have zero counts in one condition and >0 counts in another?
My results output shows that these genes have a positive log2 fold-change value, but I do not understand how DESeq2 arrives at this number if it is taking the log of a ratio, in which the numerator is divided by zero.
Count data (letters are conditions; numbers are replicates):
A1 | A2 | B1 | B2 | |
Gene 1 | 0 | 0 | 692 | 1350 |
> dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ condition) > dds$condition <- factor(dds$condition, levels=c("A","B")) > dds <- DESeq(dds) > res<-results(dds,independentFiltering = F)
Results table:
baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | |
Gene1 | 585.1394 | 11.78521 | 1.458913 | 8.078076 | 6.579644e-16 | 3.884571e-15 |
I know that DESeq1 gave an 'Inf' value in these cases, but how does DESeq2 arrive at a real number value?
Thanks
"some positive value below 0" You probably meant "above 0" :)
yes. below 1. those positive values below 0 are hard to find.
does the same hold when using betaprior=FALSE? thanks
When betaPrior=FALSE, the MLE LFC goes to infinity, but in practice we don't wait for it to reach infinity, but we stop at a large value, a fold change of exp(30).
genes with for example all zeros in control condition (the denominator), don't get the same MLE FC, so I guess this depends also on the values of the numerator?
It's a stopping rule, I don't then set the values to have a natural log fold change of 30. Also, things are limited from going to Infinity because I have a bound on the expected value for mu, which acts similar to a pseudocount (but it's not a pseudocount approach).