Deviance returned by DESeq2
1
0
Entering edit mode
@peter-langfelder-4469
Last seen 23 days ago
United States

This may be more of a statistical question, sorry if it's off topic. I am trying to understand the deviance returned by DESeq2. Could not find an explicit description so I assumed it would be the deviance of the fitted model compared to the saturated model, which is also what standard GLM functions return.

What confuses me is that the deviance seems to be a strongly increasing function of the mean expression. To create a reproducible example, I'll work with the pasilla data set as used in DESeq2 user guide. This code from the user guide sets up the DESeq2 object.

library("pasilla")
pasCts <- system.file("extdata",
                      "pasilla_gene_counts.tsv",
                      package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata",
                       "pasilla_sample_annotation.csv",
                       package="pasilla", mustWork=TRUE)
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","type")]

rownames(coldata) <- sub("fb", "", rownames(coldata))
all(rownames(coldata) %in% colnames(cts))

cts <- cts[, rownames(coldata)]
all(rownames(coldata) == colnames(cts))
dds <- DESeq(dds)

Next I plot the deviance vs. log of mean expression (base mean). Both are contained in the data frame returned by mcols.

mc = as.data.frame(mcols(dds))
plot(log(mc$baseMean), mc$deviance)

The plot shows a fairly linear increase of deviance with log of the mean expression. From my somewhat limited understanding of GLMs, the deviance can be used as a measure of goodness of fit, but that does not seem to be the case here (I would not expect such a strong, if any, dependence of goodness of fit on mean expression). If anyone can explain what the deviance measures here and why it is nearly linearly related to mean expression, I'd be very grateful.

Peter

deseq2 • 755 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 8 hours ago
United States

Can you show where you create DESeqDataSet?

ADD COMMENT
0
Entering edit mode

Apologies, I neglected to copy the relevant part from DESeq2 guide:

library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ condition)

Peter

ADD REPLY
1
Entering edit mode

The deviance as calculated by DESeq2 is -2 times the log likelihood.

Note that, if the dispersion is constant, and the mu-hat are good fits to the data, there is the following trend:

 -2 * dnbinom(2^1:10, mu=2^1:10, size=1/.05, log=TRUE)
ADD REPLY
0
Entering edit mode

Thanks. (I think you meant -2 * dnbinom(2^(1:10), mu=2^(1:10), size=1/.05, log=TRUE).

ADD REPLY
0
Entering edit mode

Oops, yes needs the ()

ADD REPLY
0
Entering edit mode

OK, the issue is that -2 times the log likelihood needs the -2LL(saturated model) subtracted out before it can be used as a quality of fit statistic (if at all). Which not hard to do with the existing output. If I might make a request/suggestion, it would be great to add an additional column to the output of DESeq (the mcols-accessed slot) that has -2LL(fitted model) -2LL(saturated model). Perhaps the new quantity could be called devianceVsSaturatedModel.

ADD REPLY
0
Entering edit mode

Yes, the deviance we compute wasn't really intended for quality of fit, more for computing LRTs.

Seems easy to compute on the fly:

mcols(dds)$devSat <- mcols(dds)$deviance - 2*rowSums(dnbinom(counts(dds), mu=counts(dds), size=1/dispersions(dds), log=TRUE))

I'll make a note, I think I could add this at some point and it wouldn't break any code, but I've got a long list of additional TODOs so no promises.

ADD REPLY

Login before adding your answer.

Traffic: 610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6