Issue with GSVA using dgCMatrix
1
0
Entering edit mode
resa • 0
@6753a939
Last seen 12 days ago
Finland

Hi Robert Castelo ,

I face an error when trying to run gsva function from the GSVA package using a big sparse matrix (class dgCMatrix). The matrix has dimensions of around 30k x 120k. I've noticed that using dgCMatrix objects as input is still in an experimental stage but would like to know if there's some solution under way to this.

The code starts running normally but after a while stops:

Estimating GSVA scores for 50 gene sets.
Estimating ECDFs with Gaussian kernels

Error in as.vector(.Call(Csparse_to_vector, x), mode) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102

Here is some of the traceback:

17: as.vector(.Call(Csparse_to_vector, x), mode)
16: as.vector(x, mode)
15: as.vector(x, mode)
14: as.vector(x)
13: as.double(t(expr[, sample.idxs, drop = FALSE]))
12: as.double(t(expr[, sample.idxs, drop = FALSE]))
11: compute.gene.density(expr, sample.idxs, rnaseq, kernel)
10: compute.geneset.es(expr, gset.idx.list, 1:n.samples, rnaseq = rnaseq,
        abs.ranking = abs.ranking, parallel.sz = parallel.sz, mx.diff = mx.diff,
        tau = tau, kernel = kernel, verbose = verbose, BPPARAM = BPPARAM)
9: .gsva(expr, mapped.gset.idx.list, method, kcdf, rnaseq, abs.ranking,
       parallel.sz, mx.diff, tau, kernel, ssgsea.norm, verbose,
       BPPARAM)
8: .local(expr, gset.idx.list, ...)
7: gsva(logtpm_matrix, gene_sets, kcdf = "Gaussian", min.sz = 5,
       max.sz = 500, parallel.sz = 1, verbose = TRUE)
sessionInfo( )

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] RColorBrewer_1.1-2   GSEABase_1.54.0      graph_1.70.0
 [4] annotate_1.70.0      XML_3.99-0.6         AnnotationDbi_1.54.1
 [7] IRanges_2.26.0       S4Vectors_0.30.0     Biobase_2.52.0
[10] BiocGenerics_0.38.0  GSVA_1.40.1          patchwork_1.1.1
[13] SeuratObject_4.0.2   Seurat_4.0.3         dplyr_1.0.6
GSVA • 165 views
ADD COMMENT
1
Entering edit mode
Robert Castelo ★ 2.7k
@rcastelo
Last seen 4 weeks ago
Barcelona/Universitat Pompeu Fabra

Hi,

Indeed we're still working on this for the default GSVA method, but using the ssGSEA method, i.e., adding the argument method="ssgsea" to the call to the gsva() function, should work. For a sparse matrix of these dimensions the ssGSEA method should run in about 1 hour with 15 cores in a modern workstation, i.e., using the argument BPPARAM = MulticoreParam(workers = 15L, progressbar = TRUE).

ADD COMMENT
0
Entering edit mode

Hi, Thank you for the information! I'll try ssGSEA in the meantime.

ADD REPLY

Login before adding your answer.

Traffic: 448 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6