Question

DESeq2 - dispersion estimation when there is none

0

Entering edit mode

WardDeb • 0

@901be5b0

Last seen 11 weeks ago

Germany

Hi,

I'm trying to better grasp what is happening under the hood during dispersion estimation, and have a bit of a naive question. When using the makeExampleDESeqDataSet function, I can set the dispersions to zero by just specifying a 'null function', i.e.:

dds_pois <- makeExampleDESeqDataSet(
  n=5000,
  m=10,
  dispMeanRel = function(x) 0
)

and indeed the dispersions here are all zero, and thus our counts should follow Poisson (mean == var).

mcols(dds_pois) %>% head()

DataFrame with 6 rows and 3 columns
      trueIntercept  trueBeta  trueDisp
          <numeric> <numeric> <numeric>
gene1      5.016721         0         0
gene2      0.226108         0         0
.....

If I now run DESeq and plot the estimated dispersion:

dds_pois <- DESeq(dds_pois)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.
final dispersion estimates
fitting model and testing
plotDispEsts(dds_pois, CV=TRUE)

I see a negative linear relationship between the counts and the estimated dispersion. I would expect the estimates to be a) all very close to the minimum (i.e. 1e-8) and b) not have any specific trend at all to begin with. Am I missing something in my interpretation ?

est_disps

Thanks !

DESeq2 • 351 views

ADD COMMENT • link updated 3 months ago by Michael Love 42k • written 3 months ago by WardDeb • 0

score 2 · Accepted Answer · 2024-04-22

2

Entering edit mode

Michael Love 42k

@mikelove

Last seen 9 hours ago

United States

This is discussed in the paper, that you will still have some non-zero estimates even when the true value of log dispersion is negative infinity.

For data consistent with a Poisson, you can use a Poisson GLM. In practice, we typically see non-zero dispersion for most genes when the data are not technical replicates.

ADD COMMENT • link 3 months ago Michael Love 42k

0

Entering edit mode

Thank you for the pointers, I'll revert back again to the method section of the paper. As for the negative relationship of the estimated dispersions, is it correct to assume that this is a result of the assumed mean-dependent prior for the dispersions (i.e. the alpha_tr in the manuscript) ?

ADD REPLY • link 3 months ago WardDeb • 0

1

Entering edit mode

The red line does assume that shape but the fact the black points show that pattern is discussed in the 2014 paper.

Some caution is warranted to disentangle true underlying dependence from effects of estimation bias that can create a perceived dependence of the dispersion on the mean.

ADD REPLY • link 3 months ago Michael Love 42k