Question

error in loadFun(x,seqname,ranges). BSgenome.Hsapiens.UCSC.hg19

1

Entering edit mode

sophialovechan ▴ 10

@sophialovechan-15987

Last seen 7.6 years ago

Hi, I am using BSgenome.Hsapiens.UCSC.hg19 trying to add GC bias into my ATAC sequencing data. But I kept having a problem when loading hg19 genome. Could anyone help me with this?

> counts_GC <- addGCBias(counts, genome = BSgenome.Hsapiens.UCSC.hg19)
Error in loadFUN(x, seqname, ranges) : 
  trying to load regions beyond the boundaries of non-circular sequence "chr17"

> sessionInfo()

R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocInstaller_1.30.0              BSgenome.Hsapiens.UCSC.hg19_1.4.0
 [3] BSgenome_1.48.0                   rtracklayer_1.40.2               
 [5] Biostrings_2.48.0                 XVector_0.20.0                   
 [7] SummarizedExperiment_1.10.1       DelayedArray_0.6.0               
 [9] BiocParallel_1.14.1               matrixStats_0.53.1               
[11] Biobase_2.40.0                    GenomicRanges_1.32.3             
[13] GenomeInfoDb_1.16.0               IRanges_2.14.10                  
[15] S4Vectors_0.18.2                  BiocGenerics_0.26.0              
[17] Matrix_1.2-14                     motifmatchr_1.2.0                
[19] chromVAR_1.2.0                   

loaded via a namespace (and not attached):
 [1] bitops_1.0-6                DirichletMultinomial_1.22.0 TFBSTools_1.18.0           
 [4] bit64_0.9-7                 httr_1.3.1                  tools_3.5.0                
 [7] R6_2.2.2                    DT_0.4                      seqLogo_1.46.0             
[10] DBI_1.0.0                   lazyeval_0.2.1              colorspace_1.3-2           
[13] tidyselect_0.2.4            bit_1.1-14                  compiler_3.5.0             
[16] plotly_4.7.1                caTools_1.17.1              scales_0.5.0               
[19] readr_1.1.1                 stringr_1.3.1               digest_0.6.15              
[22] Rsamtools_1.32.0            R.utils_2.6.0               pkgconfig_2.0.1            
[25] htmltools_0.3.6             htmlwidgets_1.2             rlang_0.2.1                
[28] RSQLite_2.1.1               VGAM_1.0-5                  shiny_1.1.0                
[31] bindr_0.1.1                 jsonlite_1.5                gtools_3.5.0               
[34] dplyr_0.7.5                 R.oo_1.22.0                 RCurl_1.95-4.10            
[37] magrittr_1.5                GO.db_3.6.0                 GenomeInfoDbData_1.1.0     
[40] Rcpp_0.12.17                munsell_0.4.3               R.methodsS3_1.7.1          
[43] stringi_1.2.2               zlibbioc_1.26.0             plyr_1.8.4                 
[46] grid_3.5.0                  blob_1.1.1                  promises_1.0.1             
[49] miniUI_0.1.1.1              CNEr_1.16.0                 lattice_0.20-35            
[52] splines_3.5.0               annotate_1.58.0             hms_0.4.2                  
[55] KEGGREST_1.20.0             pillar_1.2.3                reshape2_1.4.3             
[58] TFMPvalue_0.0.8             XML_3.98-1.11               glue_1.2.0                 
[61] data.table_1.11.4           png_0.1-7                   httpuv_1.4.3               
[64] tidyr_0.8.1                 gtable_0.2.0                poweRlaw_0.70.1            
[67] purrr_0.2.5                 assertthat_0.2.0            ggplot2_2.2.1              
[70] mime_0.5                    xtable_1.8-2                later_0.7.2                
[73] viridisLite_0.3.0           tibble_1.4.2                GenomicAlignments_1.16.0   
[76] AnnotationDbi_1.42.1        memoise_1.1.0               bindrcpp_0.2.2

bsgenome • 5.5k views

ADD COMMENT • link updated 7.6 years ago by James W. MacDonald 68k • written 7.6 years ago by sophialovechan ▴ 10

score 0 · Answer 1 · 2018-05-31

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

The error seems to imply that your counts object contains sequences that are off the end of chr17. This could either be due to a mismatch between genomes (most of the chromosomes in hg19 are longer than in hg38, with the notable exceptions of chrs17, 18, and 20, which are all much longer:

(seqlengths(seqinfo(BSgenome.Hsapiens.UCSC.hg38)[paste0("chr", 1:22),]) - seqlengths(seqinfo(BSgenome.Hsapiens.UCSC.hg19)[paste0("chr", 1:22),]))/1e6
     chr1      chr2      chr3      chr4      chr5      chr6      chr7      chr8
-0.294199 -1.005844  0.273129 -0.939721  0.622999 -0.309088  0.207310 -1.225386
     chr9     chr10     chr11     chr12     chr13     chr14     chr15     chr16
-2.818714 -1.737325  0.080106 -0.576586 -0.805550 -0.305822 -0.540203 -0.016408
    chr17     chr18     chr19     chr20     chr21     chr22
 2.062231  2.296037 -0.511367  1.418647 -1.419912 -0.486098

So if your data are based on hg38, but you are using hg19 to generate GC bias, chr17 is a likely one to give you a problem.

ADD COMMENT • link 7.6 years ago James W. MacDonald 68k

0

Entering edit mode

I tried hg38 as well but it run into the same error with chr1.

ADD REPLY • link 7.6 years ago sophialovechan ▴ 10

0

Entering edit mode

Well, you should check the rowRanges slot of your counts object, particularly for chr17, to see if there are sequences that are > 81195210. If so, you will need to figure out why.

ADD REPLY • link 7.6 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you very much. I am a biologist but not a bioinformatician so I kind of have no clues about your suggestion. But thanks anyway.

ADD REPLY • link 7.6 years ago sophialovechan ▴ 10

0

Entering edit mode

I looked at rowRanges slot of my counts objects and I didn't know where to find the information you mentioned. Here is an example of data I got from rowRanges.

seqnames	start	end	width	strand	score	qval
chr1	10358	10857	500	*	82	8.28541
chr1	11151	11650	500	*	53	5.3528
chr1	29063	29562	500	*	150	15.03696
chr1	32325	32824	500	*	244	24.4669
chr1	114730	115229	500	*	53	5.3528
chr1	236358	236857	500	*	32	3.22398
chr1	237521	238020	500	*	50	5.09896
chr1	348127	348626	500	*	95	9.54581

ADD REPLY • link 7.6 years ago sophialovechan ▴ 10

0

Entering edit mode

Using the example data from chromVAR:

> rowRanges(example_counts)[seqnames(rowRanges(example_counts)) == "chr17",]
GRanges object with 1414 ranges and 3 metadata columns:
         seqnames            ranges strand |     score      qval           name
            <Rle>         <IRanges>  <Rle> | <integer> <numeric>    <character>
     [1]    chr17       65281-65780      * |        98   9.85266  H1_peak_21646
     [2]    chr17     259705-260204      * |       529  52.96062  GM_peak_32813
     [3]    chr17     266470-266969      * |       140  14.03065 GM_peak_32814b
     [4]    chr17     617827-618326      * |       115  11.54114  H1_peak_21654
     [5]    chr17     635078-635577      * |       150  15.05708  H1_peak_21657
     ...      ...               ...    ... .       ...       ...            ...
  [1410]    chr17 80710122-80710621      * |        82   8.21494  H1_peak_24325
  [1411]    chr17 80817847-80818346      * |       278  27.80532  GM_peak_36405
  [1412]    chr17 80821752-80822251      * |       509  50.93502  GM_peak_36407
  [1413]    chr17 80828979-80829478      * |       156  15.65456  GM_peak_36409
  [1414]    chr17 81009554-81010053      * |       869  86.91984 GM_peak_36410b
  -------
  seqinfo: 23 sequences from an unspecified genome

ADD REPLY • link 7.6 years ago James W. MacDonald 68k

0

Entering edit mode

Yes. I saw the last line with sequences > 81195210

chr17

81194811

81195310

500

*

148

14.89645

ADD REPLY • link 7.6 years ago sophialovechan ▴ 10

0

Entering edit mode

OK, so that's a problem. You need to figure out why you have counts from a region of chr17 that don't exist in the hg19 genome that you said you used. This isn't a Bioconductor question, really, as the software is doing what it is supposed to do. You just have data that don't fulfill expectations, so you need to run that down.

ADD REPLY • link 7.6 years ago James W. MacDonald 68k

0

Entering edit mode

OK. Thanks for your help.

ADD REPLY • link 7.6 years ago sophialovechan ▴ 10

0

Entering edit mode

Or more directly

(seqlengths(BSgenome.Hsapiens.UCSC.hg38)[1:22] - seqlengths(BSgenome.Hsapiens.UCSC.hg19)[1:22])/1e6
     chr1      chr2      chr3      chr4      chr5      chr6      chr7      chr8
-0.294199 -1.005844  0.273129 -0.939721  0.622999 -0.309088  0.207310 -1.225386
     chr9     chr10     chr11     chr12     chr13     chr14     chr15     chr16
-2.818714 -1.737325  0.080106 -0.576586 -0.805550 -0.305822 -0.540203 -0.016408
    chr17     chr18     chr19     chr20     chr21     chr22
 2.062231  2.296037 -0.511367  1.418647 -1.419912 -0.486098

ADD REPLY • link 7.6 years ago James W. MacDonald 68k

0

Entering edit mode

counts_GC <- addGCBias(counts, genome = BSgenome.Hsapiens.UCSC.hg38)
Error in loadFUN(x, seqname, ranges) :
trying to load regions beyond the boundaries of non-circular sequence "chr1"