Entering edit mode
sophialovechan
▴
10
@sophialovechan-15987
Last seen 7.5 years ago
Hi, I am using BSgenome.Hsapiens.UCSC.hg19 trying to add GC bias into my ATAC sequencing data. But I kept having a problem when loading hg19 genome. Could anyone help me with this?
> counts_GC <- addGCBias(counts, genome = BSgenome.Hsapiens.UCSC.hg19) Error in loadFUN(x, seqname, ranges) : trying to load regions beyond the boundaries of non-circular sequence "chr17" > sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.4 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] BiocInstaller_1.30.0 BSgenome.Hsapiens.UCSC.hg19_1.4.0 [3] BSgenome_1.48.0 rtracklayer_1.40.2 [5] Biostrings_2.48.0 XVector_0.20.0 [7] SummarizedExperiment_1.10.1 DelayedArray_0.6.0 [9] BiocParallel_1.14.1 matrixStats_0.53.1 [11] Biobase_2.40.0 GenomicRanges_1.32.3 [13] GenomeInfoDb_1.16.0 IRanges_2.14.10 [15] S4Vectors_0.18.2 BiocGenerics_0.26.0 [17] Matrix_1.2-14 motifmatchr_1.2.0 [19] chromVAR_1.2.0 loaded via a namespace (and not attached): [1] bitops_1.0-6 DirichletMultinomial_1.22.0 TFBSTools_1.18.0 [4] bit64_0.9-7 httr_1.3.1 tools_3.5.0 [7] R6_2.2.2 DT_0.4 seqLogo_1.46.0 [10] DBI_1.0.0 lazyeval_0.2.1 colorspace_1.3-2 [13] tidyselect_0.2.4 bit_1.1-14 compiler_3.5.0 [16] plotly_4.7.1 caTools_1.17.1 scales_0.5.0 [19] readr_1.1.1 stringr_1.3.1 digest_0.6.15 [22] Rsamtools_1.32.0 R.utils_2.6.0 pkgconfig_2.0.1 [25] htmltools_0.3.6 htmlwidgets_1.2 rlang_0.2.1 [28] RSQLite_2.1.1 VGAM_1.0-5 shiny_1.1.0 [31] bindr_0.1.1 jsonlite_1.5 gtools_3.5.0 [34] dplyr_0.7.5 R.oo_1.22.0 RCurl_1.95-4.10 [37] magrittr_1.5 GO.db_3.6.0 GenomeInfoDbData_1.1.0 [40] Rcpp_0.12.17 munsell_0.4.3 R.methodsS3_1.7.1 [43] stringi_1.2.2 zlibbioc_1.26.0 plyr_1.8.4 [46] grid_3.5.0 blob_1.1.1 promises_1.0.1 [49] miniUI_0.1.1.1 CNEr_1.16.0 lattice_0.20-35 [52] splines_3.5.0 annotate_1.58.0 hms_0.4.2 [55] KEGGREST_1.20.0 pillar_1.2.3 reshape2_1.4.3 [58] TFMPvalue_0.0.8 XML_3.98-1.11 glue_1.2.0 [61] data.table_1.11.4 png_0.1-7 httpuv_1.4.3 [64] tidyr_0.8.1 gtable_0.2.0 poweRlaw_0.70.1 [67] purrr_0.2.5 assertthat_0.2.0 ggplot2_2.2.1 [70] mime_0.5 xtable_1.8-2 later_0.7.2 [73] viridisLite_0.3.0 tibble_1.4.2 GenomicAlignments_1.16.0 [76] AnnotationDbi_1.42.1 memoise_1.1.0 bindrcpp_0.2.2

I tried hg38 as well but it run into the same error with chr1.
Well, you should check the rowRanges slot of your counts object, particularly for chr17, to see if there are sequences that are > 81195210. If so, you will need to figure out why.
Thank you very much. I am a biologist but not a bioinformatician so I kind of have no clues about your suggestion. But thanks anyway.
I looked at rowRanges slot of my counts objects and I didn't know where to find the information you mentioned. Here is an example of data I got from rowRanges.
Using the example data from chromVAR:
> rowRanges(example_counts)[seqnames(rowRanges(example_counts)) == "chr17",] GRanges object with 1414 ranges and 3 metadata columns: seqnames ranges strand | score qval name <Rle> <IRanges> <Rle> | <integer> <numeric> <character> [1] chr17 65281-65780 * | 98 9.85266 H1_peak_21646 [2] chr17 259705-260204 * | 529 52.96062 GM_peak_32813 [3] chr17 266470-266969 * | 140 14.03065 GM_peak_32814b [4] chr17 617827-618326 * | 115 11.54114 H1_peak_21654 [5] chr17 635078-635577 * | 150 15.05708 H1_peak_21657 ... ... ... ... . ... ... ... [1410] chr17 80710122-80710621 * | 82 8.21494 H1_peak_24325 [1411] chr17 80817847-80818346 * | 278 27.80532 GM_peak_36405 [1412] chr17 80821752-80822251 * | 509 50.93502 GM_peak_36407 [1413] chr17 80828979-80829478 * | 156 15.65456 GM_peak_36409 [1414] chr17 81009554-81010053 * | 869 86.91984 GM_peak_36410b ------- seqinfo: 23 sequences from an unspecified genomeYes. I saw the last line with sequences > 81195210
OK, so that's a problem. You need to figure out why you have counts from a region of chr17 that don't exist in the hg19 genome that you said you used. This isn't a Bioconductor question, really, as the software is doing what it is supposed to do. You just have data that don't fulfill expectations, so you need to run that down.
OK. Thanks for your help.
Or more directly
(seqlengths(BSgenome.Hsapiens.UCSC.hg38)[1:22] - seqlengths(BSgenome.Hsapiens.UCSC.hg19)[1:22])/1e6 chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 -0.294199 -1.005844 0.273129 -0.939721 0.622999 -0.309088 0.207310 -1.225386 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 -2.818714 -1.737325 0.080106 -0.576586 -0.805550 -0.305822 -0.540203 -0.016408 chr17 chr18 chr19 chr20 chr21 chr22 2.062231 2.296037 -0.511367 1.418647 -1.419912 -0.486098counts_GC <- addGCBias(counts, genome = BSgenome.Hsapiens.UCSC.hg38)
Error in loadFUN(x, seqname, ranges) :
trying to load regions beyond the boundaries of non-circular sequence "chr1"
I checked my alignment and I used hg19 when I run bowtie.