error in loadFun(x,seqname,ranges). BSgenome.Hsapiens.UCSC.hg19
1
1
Entering edit mode
@sophialovechan-15987
Last seen 5.8 years ago

Hi, I am using BSgenome.Hsapiens.UCSC.hg19 trying to add GC bias into my ATAC sequencing data. But I kept having a problem when loading hg19 genome. Could anyone help me with this?

> counts_GC <- addGCBias(counts, genome = BSgenome.Hsapiens.UCSC.hg19)
Error in loadFUN(x, seqname, ranges) : 
  trying to load regions beyond the boundaries of non-circular sequence "chr17"

> sessionInfo()

R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocInstaller_1.30.0              BSgenome.Hsapiens.UCSC.hg19_1.4.0
 [3] BSgenome_1.48.0                   rtracklayer_1.40.2               
 [5] Biostrings_2.48.0                 XVector_0.20.0                   
 [7] SummarizedExperiment_1.10.1       DelayedArray_0.6.0               
 [9] BiocParallel_1.14.1               matrixStats_0.53.1               
[11] Biobase_2.40.0                    GenomicRanges_1.32.3             
[13] GenomeInfoDb_1.16.0               IRanges_2.14.10                  
[15] S4Vectors_0.18.2                  BiocGenerics_0.26.0              
[17] Matrix_1.2-14                     motifmatchr_1.2.0                
[19] chromVAR_1.2.0                   

loaded via a namespace (and not attached):
 [1] bitops_1.0-6                DirichletMultinomial_1.22.0 TFBSTools_1.18.0           
 [4] bit64_0.9-7                 httr_1.3.1                  tools_3.5.0                
 [7] R6_2.2.2                    DT_0.4                      seqLogo_1.46.0             
[10] DBI_1.0.0                   lazyeval_0.2.1              colorspace_1.3-2           
[13] tidyselect_0.2.4            bit_1.1-14                  compiler_3.5.0             
[16] plotly_4.7.1                caTools_1.17.1              scales_0.5.0               
[19] readr_1.1.1                 stringr_1.3.1               digest_0.6.15              
[22] Rsamtools_1.32.0            R.utils_2.6.0               pkgconfig_2.0.1            
[25] htmltools_0.3.6             htmlwidgets_1.2             rlang_0.2.1                
[28] RSQLite_2.1.1               VGAM_1.0-5                  shiny_1.1.0                
[31] bindr_0.1.1                 jsonlite_1.5                gtools_3.5.0               
[34] dplyr_0.7.5                 R.oo_1.22.0                 RCurl_1.95-4.10            
[37] magrittr_1.5                GO.db_3.6.0                 GenomeInfoDbData_1.1.0     
[40] Rcpp_0.12.17                munsell_0.4.3               R.methodsS3_1.7.1          
[43] stringi_1.2.2               zlibbioc_1.26.0             plyr_1.8.4                 
[46] grid_3.5.0                  blob_1.1.1                  promises_1.0.1             
[49] miniUI_0.1.1.1              CNEr_1.16.0                 lattice_0.20-35            
[52] splines_3.5.0               annotate_1.58.0             hms_0.4.2                  
[55] KEGGREST_1.20.0             pillar_1.2.3                reshape2_1.4.3             
[58] TFMPvalue_0.0.8             XML_3.98-1.11               glue_1.2.0                 
[61] data.table_1.11.4           png_0.1-7                   httpuv_1.4.3               
[64] tidyr_0.8.1                 gtable_0.2.0                poweRlaw_0.70.1            
[67] purrr_0.2.5                 assertthat_0.2.0            ggplot2_2.2.1              
[70] mime_0.5                    xtable_1.8-2                later_0.7.2                
[73] viridisLite_0.3.0           tibble_1.4.2                GenomicAlignments_1.16.0   
[76] AnnotationDbi_1.42.1        memoise_1.1.0               bindrcpp_0.2.2  
bsgenome • 4.0k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States

The error seems to imply that your counts object contains sequences that are off the end of chr17. This could either be due to a mismatch between genomes (most of the chromosomes in hg19 are longer than in hg38, with the notable exceptions of chrs17, 18, and 20, which are all much longer:

(seqlengths(seqinfo(BSgenome.Hsapiens.UCSC.hg38)[paste0("chr", 1:22),]) - seqlengths(seqinfo(BSgenome.Hsapiens.UCSC.hg19)[paste0("chr", 1:22),]))/1e6
     chr1      chr2      chr3      chr4      chr5      chr6      chr7      chr8
-0.294199 -1.005844  0.273129 -0.939721  0.622999 -0.309088  0.207310 -1.225386
     chr9     chr10     chr11     chr12     chr13     chr14     chr15     chr16
-2.818714 -1.737325  0.080106 -0.576586 -0.805550 -0.305822 -0.540203 -0.016408
    chr17     chr18     chr19     chr20     chr21     chr22
 2.062231  2.296037 -0.511367  1.418647 -1.419912 -0.486098

So if  your data are based on hg38, but you are using hg19 to generate GC bias, chr17 is a likely one to give you a problem.

ADD COMMENT
0
Entering edit mode

I tried hg38 as well but it run into the same error with chr1.

ADD REPLY
0
Entering edit mode

Well, you should check the rowRanges slot of your counts object, particularly for chr17, to see if there are sequences that are > 81195210. If so, you will need to figure out why.

ADD REPLY
0
Entering edit mode

Thank you very much. I am a biologist but not a bioinformatician so I kind of have no clues about your suggestion. But thanks anyway.

ADD REPLY
0
Entering edit mode

I looked at rowRanges slot of my counts objects and I didn't know where to find the information you mentioned. Here is an example of data I got from rowRanges.

seqnames start end width strand score qval
chr1 10358 10857 500 * 82 8.28541
chr1 11151 11650 500 * 53 5.3528
chr1 29063 29562 500 * 150 15.03696
chr1 32325 32824 500 * 244 24.4669
chr1 114730 115229 500 * 53 5.3528
chr1 236358 236857 500 * 32 3.22398
chr1 237521 238020 500 * 50 5.09896
chr1 348127 348626 500 * 95 9.54581

 

ADD REPLY
0
Entering edit mode

Using the example data from chromVAR:

> rowRanges(example_counts)[seqnames(rowRanges(example_counts)) == "chr17",]
GRanges object with 1414 ranges and 3 metadata columns:
         seqnames            ranges strand |     score      qval           name
            <Rle>         <IRanges>  <Rle> | <integer> <numeric>    <character>
     [1]    chr17       65281-65780      * |        98   9.85266  H1_peak_21646
     [2]    chr17     259705-260204      * |       529  52.96062  GM_peak_32813
     [3]    chr17     266470-266969      * |       140  14.03065 GM_peak_32814b
     [4]    chr17     617827-618326      * |       115  11.54114  H1_peak_21654
     [5]    chr17     635078-635577      * |       150  15.05708  H1_peak_21657
     ...      ...               ...    ... .       ...       ...            ...
  [1410]    chr17 80710122-80710621      * |        82   8.21494  H1_peak_24325
  [1411]    chr17 80817847-80818346      * |       278  27.80532  GM_peak_36405
  [1412]    chr17 80821752-80822251      * |       509  50.93502  GM_peak_36407
  [1413]    chr17 80828979-80829478      * |       156  15.65456  GM_peak_36409
  [1414]    chr17 81009554-81010053      * |       869  86.91984 GM_peak_36410b
  -------
  seqinfo: 23 sequences from an unspecified genome

 

 

ADD REPLY
0
Entering edit mode

Yes. I saw the last line with sequences > 81195210

 

chr17 81194811 81195310 500 * 148 14.89645
ADD REPLY
0
Entering edit mode

OK, so that's a problem. You need to figure out why you have counts from a region of chr17 that don't exist in the hg19 genome that you said you used. This isn't a Bioconductor question, really, as the software is doing what it is supposed to do. You just have data that don't fulfill expectations, so you need to run that down.

ADD REPLY
0
Entering edit mode

OK. Thanks for your help.

ADD REPLY
0
Entering edit mode

Or more directly

(seqlengths(BSgenome.Hsapiens.UCSC.hg38)[1:22] - seqlengths(BSgenome.Hsapiens.UCSC.hg19)[1:22])/1e6
     chr1      chr2      chr3      chr4      chr5      chr6      chr7      chr8
-0.294199 -1.005844  0.273129 -0.939721  0.622999 -0.309088  0.207310 -1.225386
     chr9     chr10     chr11     chr12     chr13     chr14     chr15     chr16
-2.818714 -1.737325  0.080106 -0.576586 -0.805550 -0.305822 -0.540203 -0.016408
    chr17     chr18     chr19     chr20     chr21     chr22
 2.062231  2.296037 -0.511367  1.418647 -1.419912 -0.486098
ADD REPLY
0
Entering edit mode

counts_GC <- addGCBias(counts, genome = BSgenome.Hsapiens.UCSC.hg38)
Error in loadFUN(x, seqname, ranges) : 
  trying to load regions beyond the boundaries of non-circular sequence "chr1"

ADD REPLY
0
Entering edit mode

I checked my alignment and I used hg19 when I run bowtie.

ADD REPLY

Login before adding your answer.

Traffic: 604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6