Hi, I think I may have found a bug in GOTHiC. It does not seem to work with hg19, while it works fine with hg18, independent of the input, restriction site, etc.
You can reproduce this error by just using the vignette sample, but switching the genome from hg18 to hg19. The input files doesn't really matter I believe
dirPath = system.file("extdata", package="HiCDataLymphoblast") fileName1 = list.files(dirPath, full.names=TRUE)[1] fileName2 = list.files(dirPath, full.names=TRUE)[2] binom=GOTHiC(fileName1,fileName2, sampleName="lymphoid_chr20", res=1000000, BSgenomeName="BSgenome.Hsapiens.UCSC.hg19", genome=BSgenome.Hsapiens.UCSC.hg19, restrictionSite="A^AGCTT", enzyme="HindIII" ,cistrans="all", filterdist=10000, DUPLICATETHRESHOLD=1, fileType="Bowtie", parallel=FALSE, cores=NULL)
Output:
Loading paired reads file ... Loading mapped reads file ... Computing binomial ... Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : solving row 837856: negative widths are not allowed
I traced the problem to:
GOTHiC:::.getRestrictionSitesFromBSgenome
When it tries to create this GRanges object, there are regions with ends > starts and so it can't create it
resGR <- GRanges(seqnames = chrs, ranges = IRanges(start = starts, end = ends))
If you look at:
which(ends-starts < 0) chr1_gl000191_random chr1_gl000192_random chr4_ctg9_hap1 chr4_gl000193_random chr4_gl000194_random chr6_apd_hap1 7127610 7128943 7135413 7148952 7160965 7172430 7183213 chr6_dbb_hap3 chr8_gl000196_random chr8_gl000197_random chr9_gl000198_random chr9_gl000201_random chr11_gl000202_random chr17_gl000203_random 7207137 7213533 7214204 7214510 7215687 7216069 7217083 chr17_gl000204_random chr17_gl000205_random chr18_gl000207_random chr19_gl000208_random 7217456 7219252 7219253 7219266 7219558 7219938 7219998 chr21_gl000210_random 7220097 7220178 7220280 7220400 7220507 7220594 7220706 7220853 7220953 7221068 7221187 7221289 7221419 7221525 chrUn_gl000226 chrUn_gl000229 7221604 7221696 7221792 7221793 7221894 7221999 7222104
Something wrong, I think, with the way it processes the genome, specific to hg19.
sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=de_AT.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=de_AT.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 GOTHiC_1.2.2 [3] data.table_1.9.4 BSgenome_1.34.1 [5] rtracklayer_1.26.3 Biostrings_2.34.1 [7] XVector_0.6.0 GenomicRanges_1.18.4 [9] GenomeInfoDb_1.2.5 IRanges_2.0.1 [11] S4Vectors_0.4.0 BiocGenerics_0.12.1 [13] project.init_0.0.1 BiocInstaller_1.16.5 loaded via a namespace (and not attached): [1] base64enc_0.1-2 BatchJobs_1.6 BBmisc_1.9 [4] Biobase_2.26.0 BiocParallel_1.0.3 bitops_1.0-6 [7] brew_1.0-6 checkmate_1.6.1 chron_2.3-47 [10] codetools_0.2-14 colorspace_1.2-6 crayon_1.3.1 [13] DBI_0.3.1 digest_0.6.8 fail_1.2 [16] foreach_1.4.2 GenomicAlignments_1.2.2 ggplot2_1.0.1 [19] grid_3.1.2 gtable_0.1.2 hwriter_1.3.2 [22] iterators_1.0.7 lattice_0.20-33 latticeExtra_0.6-26 [25] magrittr_1.5 MASS_7.3-43 memoise_0.2.1 [28] munsell_0.4.2 plyr_1.8.3 proto_0.3-10 [31] RColorBrewer_1.1-2 Rcpp_0.12.0 RCurl_1.95-4.7 [34] reshape2_1.4.1 Rsamtools_1.18.3 RSQLite_1.0.0 [37] scales_0.2.5 sendmailR_1.2-1 ShortRead_1.24.0 [40] stringi_0.5-5 stringr_1.0.0 testthat_0.10.0 [43] tools_3.1.2 XML_3.98-1.3 zlibbioc_1.12.0 >
you can fix the bug, maybe, with something like this: