I have a problem while smoothing using the bsseq package. I imported bismark coverage files that I previously separated based on cytosine context (e.g., CG, CHG, CHH). I was able to processed data for the CG and CHG context following the manual but when I tried to process the CHH data, I obtained the error below:
> library("bsseq") > list.files(pattern="*.Gm01.CHH.gz") -> cov.files > read.bismark(files=cov.files,sampleNames = gsub(".Gm01.CHH.gz","",cov.files), rmZeroCov = TRUE) -> bsseq.data Assuming file type iscov Read 13672009 rows and 6 (of 6) columns from 0.373 GB file in 00:00:04 done in 8.2 secs Read 12227559 rows and 6 (of 6) columns from 0.336 GB file in 00:00:04 done in 7.7 secs Read 13374971 rows and 6 (of 6) columns from 0.368 GB file in 00:00:04 done in 8.8 secs Read 13209474 rows and 6 (of 6) columns from 0.365 GB file in 00:00:04 done in 7.9 secs [read.bismark] Joining samples ... done in 29.9 secs > bsseq.data An object of type 'BSseq' with 14696052 methylation loci 4 samples has not been smoothed > bsseq.data.smoothed <- BSmooth(bsseq.data, mc.cores=8, parallelBy="chromosome", verbose=TRUE) [BSmooth] preprocessing ... done in 7.6 sec [BSmooth] smoothing by 'chromosome' (mc.cores = 8, mc.preschedule = FALSE) Error in BSmooth(bsseq.data, mc.cores = 8, parallelBy = "chromosome", : BSmooth encountered smoothing errors In addition: Warning message: In mclapply(1:length(clusterIdx), function(ii) { : 1 function calls resulted in an error
Since there are more CHH sites in the genome than the other two contexts, I thought the number of methylation loci was too much so I was going to smooth one chromosome at a time and then combine them afterwards. However, I obtained the same error message. I've tried this on several datasets and it always fail on the CHH separated files. Do you have any ideas?
Thank you.
> sessionInfo() R version 3.3.1 (2016-06-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.4 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] BiocInstaller_1.24.0 edgeR_3.16.5 limma_3.30.13 bsseq_1.10.0 systemPipeR_1.8.1 [6] RSQLite_1.1-2 DBI_0.6 ShortRead_1.32.1 GenomicAlignments_1.10.1 SummarizedExperiment_1.4.0 [11] Biobase_2.34.0 BiocParallel_1.8.1 Rsamtools_1.26.1 Biostrings_2.42.1 XVector_0.14.1 [16] GenomicRanges_1.26.4 GenomeInfoDb_1.10.3 IRanges_2.8.2 S4Vectors_0.12.2 BiocGenerics_0.20.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.10 locfit_1.5-9.1 lattice_0.20-34 GO.db_3.4.0 gtools_3.5.0 digest_0.6.12 [7] plyr_1.8.4 futile.options_1.0.0 BatchJobs_1.6 backports_1.0.5 ggplot2_2.2.1 zlibbioc_1.20.0 [13] GenomicFeatures_1.26.3 data.table_1.10.4 annotate_1.52.1 R.oo_1.21.0 R.utils_2.5.0 Matrix_1.2-8 [19] checkmate_1.8.2 GOstats_2.40.0 splines_3.3.1 stringr_1.2.0 pheatmap_1.0.8 RCurl_1.95-4.8 [25] biomaRt_2.30.0 munsell_0.4.3 sendmailR_1.2-1 rtracklayer_1.34.2 base64enc_0.1-3 BBmisc_1.11 [31] fail_1.3 matrixStats_0.51.0 XML_3.98-1.5 AnnotationForge_1.16.1 R.methodsS3_1.7.1 bitops_1.0-6 [37] grid_3.3.1 RBGL_1.50.0 xtable_1.8-2 GSEABase_1.36.0 gtable_0.2.0 magrittr_1.5 [43] scales_0.4.1 graph_1.52.0 stringi_1.1.3 hwriter_1.3.2 genefilter_1.56.0 latticeExtra_0.6-28 [49] futile.logger_1.4.3 brew_1.0-6 rjson_0.2.15 lambda.r_1.1.9 RColorBrewer_1.1-2 tools_3.3.1 [55] Category_2.40.0 survival_2.40-1 AnnotationDbi_1.36.2 colorspace_1.3-2
Hi Brandon,
Can you produce and share a minimal CHH file that reproduces the error? I've not had issues specific to CHH files.
Cheers,
Pete
Hi Peter,
I attached a link containing the smallest CHH files that generate the error. Interestingly, when I reduced the same data file to contain 1M or 10M loci, the smoothing step worked without any error. When I increased the number of loci to 12M, the error appeared. This is odd since the number of loci for CHG was > 35M and it worked okay.
https://www.dropbox.com/sh/ohxfb4dujqkbuxe/AACjXl6T1lezDjuDVo1y-8rfa?dl=0
Thank you.
Best,
Brandon
Thanks, Brandon. I was able to reproduce the error using the BS1.12M.CHH.gz file. Ultimately, an error is occurring when
locfdr::locfit()
is called from withinbsseq::BSmooth()
. Unfortunately, because it's a little buried and calling some fairly complicated code in locfdr, this means it will take a little work to track down and fix. I'll spend some time on it this week and post back when I have more.Thanks for your patience,
Pete