GOTHiC hg19 error
1
0
Entering edit mode
@nathan-sheffield-7613
Last seen 6 months ago
University of Virginia

Hi, I think I may have found a bug in GOTHiC. It does not seem to work with hg19, while it works fine with hg18, independent of the input, restriction site, etc.

You can reproduce this error by just using the vignette sample, but switching the genome from hg18 to hg19. The input files doesn't really matter I believe

dirPath = system.file("extdata", package="HiCDataLymphoblast")
fileName1 = list.files(dirPath, full.names=TRUE)[1]
fileName2 = list.files(dirPath, full.names=TRUE)[2]
binom=GOTHiC(fileName1,fileName2, sampleName="lymphoid_chr20",
res=1000000, BSgenomeName="BSgenome.Hsapiens.UCSC.hg19",
genome=BSgenome.Hsapiens.UCSC.hg19,
restrictionSite="A^AGCTT", enzyme="HindIII" ,cistrans="all", filterdist=10000,
DUPLICATETHRESHOLD=1, fileType="Bowtie", parallel=FALSE, cores=NULL)

 

Output:

Loading paired reads file ...
Loading mapped reads file ...
Computing binomial ...
Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") :
  solving row 837856: negative widths are not allowed

 

I traced the problem to:

GOTHiC:::.getRestrictionSitesFromBSgenome

When it tries to create this GRanges object, there are regions with ends > starts and so it can't create it

resGR <- GRanges(seqnames = chrs, ranges = IRanges(start = starts,
        end = ends))

If you look at:

which(ends-starts < 0)
                       chr1_gl000191_random  chr1_gl000192_random        chr4_ctg9_hap1  chr4_gl000193_random  chr4_gl000194_random         chr6_apd_hap1
              7127610               7128943               7135413               7148952               7160965               7172430               7183213
        chr6_dbb_hap3  chr8_gl000196_random  chr8_gl000197_random  chr9_gl000198_random  chr9_gl000201_random chr11_gl000202_random chr17_gl000203_random
              7207137               7213533               7214204               7214510               7215687               7216069               7217083
chr17_gl000204_random chr17_gl000205_random                                             chr18_gl000207_random chr19_gl000208_random                       
              7217456               7219252               7219253               7219266               7219558               7219938               7219998
chr21_gl000210_random                                                                                                                                     
              7220097               7220178               7220280               7220400               7220507               7220594               7220706
                                                                                                                                                          
              7220853               7220953               7221068               7221187               7221289               7221419               7221525
                                                   chrUn_gl000226                                                                          chrUn_gl000229
              7221604               7221696               7221792               7221793               7221894               7221999               7222104

 

Something wrong, I think, with the way it processes the genome, specific to hg19.

sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_AT.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_AT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 GOTHiC_1.2.2                     
 [3] data.table_1.9.4                  BSgenome_1.34.1                  
 [5] rtracklayer_1.26.3                Biostrings_2.34.1                
 [7] XVector_0.6.0                     GenomicRanges_1.18.4             
 [9] GenomeInfoDb_1.2.5                IRanges_2.0.1                    
[11] S4Vectors_0.4.0                   BiocGenerics_0.12.1              
[13] project.init_0.0.1                BiocInstaller_1.16.5             

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2         BatchJobs_1.6           BBmisc_1.9             
 [4] Biobase_2.26.0          BiocParallel_1.0.3      bitops_1.0-6           
 [7] brew_1.0-6              checkmate_1.6.1         chron_2.3-47           
[10] codetools_0.2-14        colorspace_1.2-6        crayon_1.3.1           
[13] DBI_0.3.1               digest_0.6.8            fail_1.2               
[16] foreach_1.4.2           GenomicAlignments_1.2.2 ggplot2_1.0.1          
[19] grid_3.1.2              gtable_0.1.2            hwriter_1.3.2          
[22] iterators_1.0.7         lattice_0.20-33         latticeExtra_0.6-26    
[25] magrittr_1.5            MASS_7.3-43             memoise_0.2.1          
[28] munsell_0.4.2           plyr_1.8.3              proto_0.3-10           
[31] RColorBrewer_1.1-2      Rcpp_0.12.0             RCurl_1.95-4.7         
[34] reshape2_1.4.1          Rsamtools_1.18.3        RSQLite_1.0.0          
[37] scales_0.2.5            sendmailR_1.2-1         ShortRead_1.24.0       
[40] stringi_0.5-5           stringr_1.0.0           testthat_0.10.0        
[43] tools_3.1.2             XML_3.98-1.3            zlibbioc_1.12.0        
>

GOTHiC hic • 1.6k views
ADD COMMENT
0
Entering edit mode

you can fix the bug, maybe, with something like this:

w = which(ends-starts > 0)
resGR <- GRanges(seqnames = chrs[w], ranges = IRanges(start = starts[w],
        end = ends[w]))
ADD REPLY
0
Entering edit mode
@nathan-sheffield-7613
Last seen 6 months ago
University of Virginia

Upgrading to the latest version solves the problem.

ADD COMMENT
0
Entering edit mode

Can I ask you to clarify? Do you specifically mean updating the BSgenome hg19 genome assembly information, GOTHIC? or R? I've had this issue using R 3.4 with Bioconductor 3.5, so I am presently hoping this relates to the genome assembly release and that updating the genome information will help. 

ADD REPLY

Login before adding your answer.

Traffic: 819 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6