Question

Deviance returned by DESeq2

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 23 days ago

United States

This may be more of a statistical question, sorry if it's off topic. I am trying to understand the deviance returned by DESeq2. Could not find an explicit description so I assumed it would be the deviance of the fitted model compared to the saturated model, which is also what standard GLM functions return.

What confuses me is that the deviance seems to be a strongly increasing function of the mean expression. To create a reproducible example, I'll work with the pasilla data set as used in DESeq2 user guide. This code from the user guide sets up the DESeq2 object.

library("pasilla")
pasCts <- system.file("extdata",
                      "pasilla_gene_counts.tsv",
                      package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata",
                       "pasilla_sample_annotation.csv",
                       package="pasilla", mustWork=TRUE)
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","type")]

rownames(coldata) <- sub("fb", "", rownames(coldata))
all(rownames(coldata) %in% colnames(cts))

cts <- cts[, rownames(coldata)]
all(rownames(coldata) == colnames(cts))
dds <- DESeq(dds)

Next I plot the deviance vs. log of mean expression (base mean). Both are contained in the data frame returned by mcols.

mc = as.data.frame(mcols(dds))
plot(log(mc$baseMean), mc$deviance)

The plot shows a fairly linear increase of deviance with log of the mean expression. From my somewhat limited understanding of GLMs, the deviance can be used as a measure of goodness of fit, but that does not seem to be the case here (I would not expect such a strong, if any, dependence of goodness of fit on mean expression). If anyone can explain what the deviance measures here and why it is nearly linearly related to mean expression, I'd be very grateful.

Peter

deseq2 • 755 views

ADD COMMENT • link updated 5.2 years ago by Michael Love 41k • written 5.2 years ago by Peter Langfelder ★ 3.0k

score 0 · Answer 1 · 2019-01-30

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 8 hours ago

United States

Can you show where you create DESeqDataSet?

ADD COMMENT • link 5.2 years ago Michael Love 41k

0

Entering edit mode

Apologies, I neglected to copy the relevant part from DESeq2 guide:

library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ condition)

Peter

ADD REPLY • link 5.2 years ago Peter Langfelder ★ 3.0k

1

Entering edit mode

The deviance as calculated by DESeq2 is -2 times the log likelihood.

Note that, if the dispersion is constant, and the mu-hat are good fits to the data, there is the following trend:

 -2 * dnbinom(2^1:10, mu=2^1:10, size=1/.05, log=TRUE)

ADD REPLY • link 5.2 years ago Michael Love 41k

0

Entering edit mode

Thanks. (I think you meant -2 * dnbinom(2^(1:10), mu=2^(1:10), size=1/.05, log=TRUE).

ADD REPLY • link 5.2 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Oops, yes needs the ()

ADD REPLY • link 5.2 years ago Michael Love 41k

0

Entering edit mode

OK, the issue is that -2 times the log likelihood needs the -2LL(saturated model) subtracted out before it can be used as a quality of fit statistic (if at all). Which not hard to do with the existing output. If I might make a request/suggestion, it would be great to add an additional column to the output of DESeq (the mcols-accessed slot) that has -2LL(fitted model) -2LL(saturated model). Perhaps the new quantity could be called devianceVsSaturatedModel.

ADD REPLY • link 5.2 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Yes, the deviance we compute wasn't really intended for quality of fit, more for computing LRTs.

Seems easy to compute on the fly:

mcols(dds)$devSat <- mcols(dds)$deviance - 2*rowSums(dnbinom(counts(dds), mu=counts(dds), size=1/dispersions(dds), log=TRUE))

I'll make a note, I think I could add this at some point and it wouldn't break any code, but I've got a long list of additional TODOs so no promises.

ADD REPLY • link 5.2 years ago Michael Love 41k