"newsplit: out of vertex space" failure when running DESeq on low number count matrix
1
0
Entering edit mode
bli • 0
@bli-13334
Last seen 6.6 years ago
France

Note: I repost as a new question something I posted in the comments of an old one (C: [Help for DESeq]: Error in lfproc - newsplit: out of vertex space), I hope this is OK. I also refer to this issue in stackexchange: https://bioinformatics.stackexchange.com/q/909/292

I analyse count matrices using DESeq2 as part of a larger workflow, where various count matrices are submitted to differential expression analysis. For most count matrices, the same settings do not result in a failure, but for some of them, the DESeq step fails as follows:

dds <- DESeq(dds, betaPrior=T)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.
Error in lfproc(x, y, weights = weights, cens = cens, base = base, geth = geth,  :
  newsplit: out of vertex space
In addition: There were 12 warnings (use warnings() to see them)

 

The example below is on a small (225 rows, 12 columns) count matrix, with  mostly low numbers of counts (almost 75% of zeros, and 15% of ones). Maybe this is the origin of the problem.

 

Content of test_DESeq2.R:

library(DESeq2)

counts_table = "simrep_siu_counts.txt"
counts_data <- read.table(counts_table, header=T, row.names="gene")

libs <- c(rep(c(rep("WT", times=3), rep("prg1", times=3)), times=2))
treats <- rep(c("RT", "HS30", "HS30RT120"), times=4)
reps <- c(rep("1", times=6), rep("2", times=6))
col_data <- data.frame(lib=libs, treat=treats, rep=reps, lib_treat=paste(libs, treats, sep="_"), row.names=colnames(counts_data))
dds <- DESeqDataSetFromMatrix(countData=counts_data, colData=col_data, design=~ lib_treat)
dds <- DESeq(dds, betaPrior=TRUE)

 

Running it:

$ Rscript test_DESeq2.R
Loading required package: S4Vectors
Loading required package: methods
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,
    colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match,
    mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which, which.max, which.min


Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: DelayedArray
Loading required package: matrixStats

Attaching package: ‘matrixStats’

The following objects are masked from ‘package:Biobase’:

    anyMissing, rowMedians


Attaching package: ‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

    colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following object is masked from ‘package:base’:

    apply

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.
Error in lfproc(x, y, weights = weights, cens = cens, base = base, geth = geth,  :
  newsplit: out of vertex space
Calls: DESeq ... estimateDispersionsFit -> localDispersionFit -> locfit -> lfproc -> .C
In addition: There were 12 warnings (use warnings() to see them)
Execution halted

 

I uploaded the contents of the simrep_siu_counts.txt file here: http://paste.ubuntu.com/24955101/

 

Any advice on how to handle such data will be appreciated.

deseq2 • 5.3k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Why only 225 rows? Careful, the size factor estimation won't work properly if these subsets are not enriched with null genes.

Parametric usually works well for RNA-seq experiments. 

A fail-safe fallback is fitType="mean". This will always work.

ADD COMMENT
0
Entering edit mode

These are not standard RNA-seq data. These are counts for small RNAs mapping on one particular type of annotations, and I only have those annotations for which at least one read mapped in at leat one of the libraries. Maybe I should rather analyse all my small RNA categories together in order to have a higher proportion of stable annotations.

fitType="mean" indeed worked. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6