getSeq{BSgenome} error: Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)
2
0
Entering edit mode
komal.rathi ▴ 120
@komalrathi-9163
Last seen 7 months ago
United States

Hi everyone,

I am trying to get the reference base for a set of chromosome-position pairs like this:

library(Rsamtools)
library(BSgenome)
library(BSgenome.Hsapiens.UCSC.hg19)
dat <- read.table(text = 'chr start
chr1  114242
chr1  114242
chr2  7485484', header=T,stringsAsFactors=F)
gr1 <- GRanges(dat$chr,IRanges(start=as.numeric(dat$start), end=as.numeric(dat$start)))
genome <-  getBSgenome("hg19")
refbase <- getSeq(Hsapiens,gr1)

I am getting the following error:

Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : 
  subscript contains NAs or out-of-bounds indices

No idea what could be causing this, any suggestions?

This is my sessionInfo:

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.38.0                  
 [3] rtracklayer_1.30.1                Rsamtools_1.22.0                 
 [5] Biostrings_2.38.1                 XVector_0.10.0                   
 [7] GenomicRanges_1.22.1              GenomeInfoDb_1.6.1               
 [9] IRanges_2.4.2                     S4Vectors_0.8.3                  
[11] BiocGenerics_0.16.1              

loaded via a namespace (and not attached):
 [1] XML_3.98-1.3               GenomicAlignments_1.6.1   
 [3] bitops_1.0-6               futile.options_1.0.0      
 [5] zlibbioc_1.16.0            futile.logger_1.4.1       
 [7] BiocParallel_1.4.0         lambda.r_1.1.7            
 [9] tools_3.2.2                Biobase_2.30.0            
[11] RCurl_1.95-4.7             SummarizedExperiment_1.0.1
bsgenome rsamtools • 2.9k views
ADD COMMENT
0
Entering edit mode

Hi,

I can't reproduce this:

library(BSgenome.Hsapiens.UCSC.hg19)
gr1 <- GRanges(c("chr1", "chr2", "chr2"),
               IRanges(c(114242, 114242, 7485484), width=1))
gr1
# GRanges object with 3 ranges and 0 metadata columns:
#       seqnames             ranges strand
#          <Rle>          <IRanges>  <Rle>
#   [1]     chr1 [ 114242,  114242]      *
#   [2]     chr2 [ 114242,  114242]      *
#   [3]     chr2 [7485484, 7485484]      *
#   -------
#   seqinfo: 2 sequences from an unspecified genome; no seqlengths

Then:

getSeq(Hsapiens,gr1)
#   A DNAStringSet instance of length 3
#     width seq
# [1]     1 A
# [2]     1 A
# [3]     1 C

H.

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.38.0                  
 [3] rtracklayer_1.30.1                Biostrings_2.38.3                
 [5] XVector_0.10.0                    GenomicRanges_1.22.3             
 [7] GenomeInfoDb_1.6.1                IRanges_2.4.6                    
 [9] S4Vectors_0.8.6                   BiocGenerics_0.16.1              

loaded via a namespace (and not attached):
 [1] XML_3.98-1.3               Rsamtools_1.22.0          
 [3] GenomicAlignments_1.6.2    bitops_1.0-6              
 [5] futile.options_1.0.0       zlibbioc_1.16.0           
 [7] futile.logger_1.4.1        lambda.r_1.1.7            
 [9] BiocParallel_1.4.3         tools_3.2.3               
[11] Biobase_2.30.0             RCurl_1.95-4.7            
[13] SummarizedExperiment_1.0.2
ADD REPLY
0
Entering edit mode
tangming2005 ▴ 190
@tangming2005-6754
Last seen 5 months ago
United States

I had a similar problem, and it looks like there is a bug in IRanges C: IRanges/ShortRead causing crash?

After updating the packages, getSeq function now works, but the following give me a similar error as this post.

library(GenomicFeatures)
hg19.refseq.db <- makeTxDbFromUCSC(genome="hg19", table="refGene")
refseq.genes<- genes(hg19.refseq.db)

>breakpoint

GRanges object with 10 ranges and 1 metadata column:
       seqnames                 ranges strand |     value
          <Rle>              <IRanges>  <Rle> | <logical>
   [1]     chr1 [  9688116,   9688117]      - |      <NA>
   [2]     chr1 [  9881194,   9881195]      + |      <NA>
   [3]     chr1 [ 48608805,  48608825]      - |      <NA>
   [4]     chr1 [191533733, 191533742]      + |      <NA>
   [5]    chr10 [  6325494,   6325511]      - |      <NA>
   [6]    chr10 [  8130950,   8130958]      - |      <NA>
   [7]    chr10 [  8289372,   8289378]      + |      <NA>
   [8]    chr10 [ 34802689,  34802698]      + |      <NA>
   [9]    chr10 [ 55170266,  55170285]      + |      <NA>
  [10]    chr10 [ 61358365,  61358385]      + |      <NA>


hits <- findOverlaps(breakpoint, refseq.genes, select="arbitrary", ignore.strand=TRUE)

> hits
 [1]    NA  9343 14103  2797    NA    NA    NA 18139    NA    NA

> breakpoint[hits]
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : 
  subscript contains NAs or out-of-bounds indices

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.1 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bumphunter_1.10.0                 locfit_1.5-9.1                    iterators_1.0.8                  
 [4] foreach_1.4.3                     GenomicFeatures_1.22.7            AnnotationDbi_1.32.3             
 [7] Biobase_2.30.0                    BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.38.0                  
[10] rtracklayer_1.30.1                Biostrings_2.38.3                 XVector_0.10.0                   
[13] ggplot2_2.0.0                     tidyr_0.3.1                       dplyr_0.4.3                      
[16] GenomicInteractions_1.4.1         GenomicRanges_1.22.2              GenomeInfoDb_1.6.1               
[19] IRanges_2.4.6                     S4Vectors_0.8.5                   BiocGenerics_0.16.1              

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.2                biovizBase_1.18.0          lattice_0.20-33           
 [4] Rsamtools_1.22.0           assertthat_0.1             digest_0.6.8              
 [7] R6_2.1.1                   plyr_1.8.3                 chron_2.3-47              
[10] futile.options_1.0.0       acepack_1.3-3.3            RSQLite_1.0.0             
[13] zlibbioc_1.16.0            data.table_1.9.6           rpart_4.1-10              
[16] labeling_0.3               splines_3.2.2              BiocParallel_1.4.3        
[19] stringr_1.0.0              foreign_0.8-66             igraph_1.0.1              
[22] RCurl_1.95-4.7             biomaRt_2.26.1             munsell_0.4.2             
[25] Gviz_1.14.0                pkgmaker_0.22              nnet_7.3-11               
[28] SummarizedExperiment_1.0.2 gridExtra_2.0.0            Hmisc_3.17-1              
[31] codetools_0.2-14           matrixStats_0.50.1         XML_3.98-1.3              
[34] GenomicAlignments_1.6.1    bitops_1.0-6               grid_3.2.2                
[37] xtable_1.8-0               registry_0.3               gtable_0.1.2              
[40] DBI_0.3.1                  magrittr_1.5               scales_0.3.0              
[43] stringi_1.0-1              doRNG_1.6                  latticeExtra_0.6-26       
[46] futile.logger_1.4.1        Formula_1.2-1              lambda.r_1.1.7            
[49] RColorBrewer_1.1-2         tools_3.2.2                dichromat_2.0-0           
[52] rngtools_1.2.4             survival_2.38-3            colorspace_1.2-6          
[55] cluster_2.0.3              VariantAnnotation_1.16.4  

 

 

ADD COMMENT
2
Entering edit mode

This is just as the error says, you cannot subscript a GRanges or really any Vector with NAs. It is a different problem from the original post.

ADD REPLY
0
Entering edit mode
komal.rathi ▴ 120
@komalrathi-9163
Last seen 7 months ago
United States

I updated GenomicRanges from GenomicRanges_1.22.1 to GenomicRanges_1.22.2 and it worked. Thanks!

ADD COMMENT

Login before adding your answer.

Traffic: 570 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6