Dear all,
I am trying to import a BED file using rtracklayer's import function (the same problem appears with import.bed) in order to use getSeq function at a later stage and retrieve FASTA entries of this bed file (loaded as a GRanges object).
When I run getSeq i get this error:
getSeq(hg38, testbed) Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : subscript contains NAs or out-of-bounds indices
So, I just went back and checked this imported as GRanges object testbed for NAs and there are a lot of such values present!
Any suggestions?
I am new to R and rtracklayer, but have used the getSeq function successfully before.
My R session is:
R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux stretch/sid locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 [4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 grid parallel stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.30.1 Gviz_1.14.0 GenomicRanges_1.22.1 GenomeInfoDb_1.6.1 IRanges_2.4.2 [6] S4Vectors_0.8.3 genomation_1.2.1 CNEr_1.6.1 AnnotationHub_2.2.2 BiocGenerics_0.16.1 [11] BiocInstaller_1.20.1 loaded via a namespace (and not attached): [1] Biobase_2.30.0 httr_1.0.0 splines_3.2.2 [4] Formula_1.2-1 shiny_0.12.2 interactiveDisplayBase_1.8.0 [7] latticeExtra_0.6-26 BSgenome_1.38.0 Rsamtools_1.22.0 [10] impute_1.44.0 RSQLite_1.0.0 lattice_0.20-33 [13] biovizBase_1.18.0 chron_2.3-47 digest_0.6.8 [16] RColorBrewer_1.1-2 XVector_0.10.0 colorspace_1.2-6 [19] htmltools_0.2.6 httpuv_1.3.3 plyr_1.8.3 [22] XML_3.98-1.3 biomaRt_2.26.0 zlibbioc_1.16.0 [25] xtable_1.8-0 scales_0.3.0 BiocParallel_1.4.0 [28] ggplot2_1.0.1 seqPattern_1.2.0 SummarizedExperiment_1.0.1 [31] GenomicFeatures_1.22.4 nnet_7.3-11 proto_0.3-10 [34] survival_2.38-3 magrittr_1.5 mime_0.4 [37] MASS_7.3-45 foreign_0.8-66 tools_3.2.2 [40] data.table_1.9.6 matrixStats_0.15.0 gridBase_0.4-7 [43] stringr_1.0.0 munsell_0.4.2 cluster_2.0.3 [46] plotrix_3.6 AnnotationDbi_1.32.0 lambda.r_1.1.7 [49] Biostrings_2.38.1 futile.logger_1.4.1 RCurl_1.95-4.7 [52] dichromat_2.0-0 VariantAnnotation_1.16.3 bitops_1.0-6 [55] gtable_0.1.2 DBI_0.3.1 reshape2_1.4.1 [58] R6_2.1.1 GenomicAlignments_1.6.1 gridExtra_2.0.0 [61] Hmisc_3.17-0 futile.options_1.0.0 KernSmooth_2.23-15 [64] readr_0.2.2 stringi_1.0-1 Rcpp_0.12.2 [67] rpart_4.1-10 acepack_1.3-3.3
Maybe you could divide testbed in half,
testbed[1:(length(testbed)/2)]
, and usegetSeq()
with that. Continue dividing until you have one or two ranges that generate the error, and then compare the granges() with theseqnames()
andseqlengths()
of hg38.