Note: I repost as a new question something I posted in the comments of an old one (C: [Help for DESeq]: Error in lfproc - newsplit: out of vertex space), I hope this is OK. I also refer to this issue in stackexchange: https://bioinformatics.stackexchange.com/q/909/292
I analyse count matrices using DESeq2 as part of a larger workflow, where various count matrices are submitted to differential expression analysis. For most count matrices, the same settings do not result in a failure, but for some of them, the DESeq
step fails as follows:
dds <- DESeq(dds, betaPrior=T) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship -- note: fitType='parametric', but the dispersion trend was not well captured by the function: y = a/x + b, and a local regression fit was automatically substituted. specify fitType='local' or 'mean' to avoid this message next time. Error in lfproc(x, y, weights = weights, cens = cens, base = base, geth = geth, : newsplit: out of vertex space In addition: There were 12 warnings (use warnings() to see them)
The example below is on a small (225 rows, 12 columns) count matrix, with mostly low numbers of counts (almost 75% of zeros, and 15% of ones). Maybe this is the origin of the problem.
Content of test_DESeq2.R
:
library(DESeq2) counts_table = "simrep_siu_counts.txt" counts_data <- read.table(counts_table, header=T, row.names="gene") libs <- c(rep(c(rep("WT", times=3), rep("prg1", times=3)), times=2)) treats <- rep(c("RT", "HS30", "HS30RT120"), times=4) reps <- c(rep("1", times=6), rep("2", times=6)) col_data <- data.frame(lib=libs, treat=treats, rep=reps, lib_treat=paste(libs, treats, sep="_"), row.names=colnames(counts_data)) dds <- DESeqDataSetFromMatrix(countData=counts_data, colData=col_data, design=~ lib_treat) dds <- DESeq(dds, betaPrior=TRUE)
Running it:
$ Rscript test_DESeq2.R
Loading required package: S4Vectors
Loading required package: methods
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,
colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match,
mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
table, tapply, union, unique, unsplit, which, which.max, which.min
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: DelayedArray
Loading required package: matrixStats
Attaching package: ‘matrixStats’
The following objects are masked from ‘package:Biobase’:
anyMissing, rowMedians
Attaching package: ‘DelayedArray’
The following objects are masked from ‘package:matrixStats’:
colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
The following object is masked from ‘package:base’:
apply
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
Error in lfproc(x, y, weights = weights, cens = cens, base = base, geth = geth, :
newsplit: out of vertex space
Calls: DESeq ... estimateDispersionsFit -> localDispersionFit -> locfit -> lfproc -> .C
In addition: There were 12 warnings (use warnings() to see them)
Execution halted
I uploaded the contents of the simrep_siu_counts.txt
file here: http://paste.ubuntu.com/24955101/
Any advice on how to handle such data will be appreciated.
These are not standard RNA-seq data. These are counts for small RNAs mapping on one particular type of annotations, and I only have those annotations for which at least one read mapped in at leat one of the libraries. Maybe I should rather analyse all my small RNA categories together in order to have a higher proportion of stable annotations.
fitType="mean" indeed worked. Thanks.