Question

Dispersion estimation step is very slow with latest DESeq2 and glmGamPoi

0

Entering edit mode

I-Hsuan Lin ▴ 10

@i-hsuan-lin-9804

Last seen 5 weeks ago

United Kingdom

Hi, I recently upgraded to latest R 4.3 from R 4.1. While running the DEA workflow on scRNA-seq data, I noticed it was dramatically slower with my latest setup. I wonder if anyone are aware of changes that may have contributed to this, and how I can make the dispersion estimation step to run as fast as before?

Using a small dataset with 49 cells as demo (22 vs. 27 in two tissues), here's how I run DESeq():

dds <- DESeqDataSetFromMatrix(countData = counts(sce),
                              colData = droplevels(colData(sce)[,c("Barcode","tissue")]),
                              rowData = rowData(sce), design = ~ tissue)
sizeFactors(dds) <- sizeFactors(sce)
dds

system.time({
    dds <- DESeq(dds, test = "LRT", useT = TRUE, minmu = 1e-6, minReplicatesForReplace = Inf,
                 fitType = "glmGamPoi", sfType = "poscounts", reduced = ~1, quiet = FALSE)
})

# dds
class: DESeqDataSet 
dim: 17881 49 
metadata(1): version
assays(1): counts
rownames(17881): Xkr4 Gm37381 ... CAAA01147332.1 AC149090.1
rowData names(5): ID Symbol Type SEQNAME is_mito
colnames(49): LR03_AGGCCGTTCACCTCGT-1 LR03_AGGGATGAGTGCGATG-1 ... LR04_TTGACTTAGGTGTTAA-1 LR04_TTGACTTGTCTGCGGT-1
colData names(3): Barcode tissue sizeFactor

When using R version 4.1.3, DESeq2_1.34.0 and glmGamPoi_1.6.0, the run time is:

   user  system elapsed 
 10.996   6.414   9.632

When using R version 4.3.2, DESeq2_1.42.0 and glmGamPoi_1.14.3, the run time is:

    user   system  elapsed 
1661.721   25.540 1679.590

Edit:

More digging into the codes themselves, there seems to a change in how initial_lp and last_lp is calculated. This greatly increase the amount of time vapply takes to perform the calculation.

In the current version, the whole fitMu matrix is used as mu

      initial_lp <- vapply(which(fitidx), function(idx){
        sum(dnbinom(Counts[idx, ], mu = fitMu, size = 1 / alpha_hat[idx], log = TRUE))
      }, FUN.VALUE = 0.0)

      last_lp <- vapply(which(fitidx), function(idx){
        sum(dnbinom(Counts[idx, ], mu = fitMu, size = 1 / alpha_hat_new[idx], log = TRUE))
      }, FUN.VALUE = 0.0)

Whereas in the previous version, the per-gene vector fitMu[idx, ] is used as mu, which I think make more sense, isn't it? I think this is how fitMu was used in the fitting process in DESeq2.cpp

      initial_lp <- vapply(which(fitidx), function(idx){
        sum(dnbinom(Counts[idx, ], mu = fitMu[idx, ], size = 1 / alpha_hat[idx], log = TRUE))
      }, FUN.VALUE = 0.0)

      last_lp <- vapply(which(fitidx), function(idx){
        sum(dnbinom(Counts[idx, ], mu = fitMu[idx, ], size = 1 / alpha_hat_new[idx], log = TRUE))
      }, FUN.VALUE = 0.0)

Also, why is fitMu used as mean input when running glmGamPoi::overdispersion_mle(), and not the subset (fitMu[fitidx, ]) as in the previous version?

      # New version
      dispersion_fits <- glmGamPoi::overdispersion_mle(Counts[fitidx, ], mean = fitMu,
                                                       model_matrix = modelMatrix, verbose = ! quiet)

      # Old version
      dispersion_fits <- glmGamPoi::overdispersion_mle(Counts[fitidx, ], mean = fitMu[fitidx, ],
                                                       model_matrix = modelMatrix, verbose = ! quiet)

DESeq2 glmGamPoi • 311 views

ADD COMMENT • link 8 weeks ago • updated 7 weeks ago I-Hsuan Lin ▴ 10

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 8 minutes ago

United States

Thanks for the post here and on GitHub.

I'll take a look this week.

ADD COMMENT • link 7 weeks ago Michael Love 41k

0

Entering edit mode

Thanks Michael Love

ADD REPLY • link 7 weeks ago I-Hsuan Lin ▴ 10

score 1 · Accepted Answer · 2024-03-06

1

Entering edit mode

I-Hsuan Lin ▴ 10

@i-hsuan-lin-9804

Last seen 5 weeks ago

United Kingdom

Thanks to Mike, there's a fix on DESeq2's GitHub repo: https://github.com/thelovelab/DESeq2/tree/glmgampoi-iteration-fix

Relevant discussion about this for anyone interested: https://github.com/thelovelab/DESeq2/pull/64

ADD COMMENT • link 7 weeks ago I-Hsuan Lin ▴ 10