Hi All. I'm having a problem while running BSmooth in the bsseq package. I have 25 WGBS samples, human, which I've processed using Bismark, and I have read the coverage files using read.bismark. However, when I try to run BSmooth, I get the following error:
> Meth.cov.fit <- BSmooth(Meth.cov, mc.cores = 32, verbose = TRUE) [BSmooth] preprocessing ... done in 37.1 sec [BSmooth] smoothing by 'sample' (mc.cores = 32, mc.preschedule = FALSE) [BSmooth] smoothing done in 17656.1 sec Error in names(object) <- nm : 'names' attribute [25] must be the same length as the vector [2]
Here is the session info:
> sessionInfo() R version 3.4.0 (2017-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS release 6.8 (Final)` Matrix products: default BLAS/LAPACK: /usr/local/OpenBLAS/0.2.19/gcc-4.9.1/lib/libopenblas_nehalemp-r0.2.19.so` `locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ` `attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base ` `other attached packages: [1] bsseq_1.12.2 SummarizedExperiment_1.6.5 DelayedArray_0.2.7 matrixStats_0.52.2 [5] Biobase_2.36.2 GenomicRanges_1.28.6 GenomeInfoDb_1.12.3 IRanges_2.10.5 [9] S4Vectors_0.14.7 BiocGenerics_0.22.1 ` `loaded via a namespace (and not attached): [1] Rcpp_0.12.13 XVector_0.16.0 zlibbioc_1.22.0 munsell_0.4.3 colorspace_1.3-2 [6] lattice_0.20-35 plyr_1.8.4 tools_3.4.0 grid_3.4.0 data.table_1.10.4-2 [11] R.oo_1.21.0 gtools_3.5.0 permute_0.9-4 Matrix_1.2-11 GenomeInfoDbData_0.99.0 [16] R.utils_2.5.0 bitops_1.0-6 RCurl_1.95-4.8 limma_3.33.13 compiler_3.4.0 [21] R.methodsS3_1.7.1 scales_0.5.0 locfit_1.5-9.1
I am running R on a Linux cluster but using Rstudio through Xming, if that helps. Any help would be really appreciated, I'm quite puzzled by this.
Hi Pete, thanks for your help. mc.cores = 8 does work, at least it solves the error, but now the session dies because R overruns memory, even when I allocate 200 GB RAM and only smooth one chromosome. It seems like you're right about this being a memory problem.
Best, Ravi
That does sound unusual; smoothing is memory intensive but shouldn't be that bad. This is CpG methylation, correct? And you're running it withÂ
Meth.cov.fit <- BSmooth(Meth.cov, mc.cores = 32, verbose = TRUE)
?Yes, CpG methylation; the code is like this:
I don't know if this is relevant, but another odd thing I noticed is that when I run
sum(rowSums(getCoverage(Meth.cov)) == 0) I get 0, which I understand means I have nowhere in the genome that has no coverage in at least one sample. I am not sure if that's plausible, so I wonder if there is something going on with my files.
Also, if it helps, my Meth.cov BSseq object is of length 58165645.
Yes, it does sound a little odd. Especially since you have unstranded CpGs (my guess based on having 58,165,645 rows = 2 * 24 million CpGs in hg19)