Search
Question: getSeq{BSgenome} error: Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)
0
gravatar for komal.rathi
2.3 years ago by
komal.rathi40
United States
komal.rathi40 wrote:

Hi everyone,

I am trying to get the reference base for a set of chromosome-position pairs like this:

library(Rsamtools)
library(BSgenome)
library(BSgenome.Hsapiens.UCSC.hg19)
dat <- read.table(text = 'chr start
chr1  114242
chr1  114242
chr2  7485484', header=T,stringsAsFactors=F)
gr1 <- GRanges(dat$chr,IRanges(start=as.numeric(dat$start), end=as.numeric(dat$start)))
genome <-  getBSgenome("hg19")
refbase <- getSeq(Hsapiens,gr1)

I am getting the following error:

Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : 
  subscript contains NAs or out-of-bounds indices

No idea what could be causing this, any suggestions?

This is my sessionInfo:

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.38.0                  
 [3] rtracklayer_1.30.1                Rsamtools_1.22.0                 
 [5] Biostrings_2.38.1                 XVector_0.10.0                   
 [7] GenomicRanges_1.22.1              GenomeInfoDb_1.6.1               
 [9] IRanges_2.4.2                     S4Vectors_0.8.3                  
[11] BiocGenerics_0.16.1              

loaded via a namespace (and not attached):
 [1] XML_3.98-1.3               GenomicAlignments_1.6.1   
 [3] bitops_1.0-6               futile.options_1.0.0      
 [5] zlibbioc_1.16.0            futile.logger_1.4.1       
 [7] BiocParallel_1.4.0         lambda.r_1.1.7            
 [9] tools_3.2.2                Biobase_2.30.0            
[11] RCurl_1.95-4.7             SummarizedExperiment_1.0.1
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by komal.rathi40

Hi,

I can't reproduce this:

library(BSgenome.Hsapiens.UCSC.hg19)
gr1 <- GRanges(c("chr1", "chr2", "chr2"),
               IRanges(c(114242, 114242, 7485484), width=1))
gr1
# GRanges object with 3 ranges and 0 metadata columns:
#       seqnames             ranges strand
#          <Rle>          <IRanges>  <Rle>
#   [1]     chr1 [ 114242,  114242]      *
#   [2]     chr2 [ 114242,  114242]      *
#   [3]     chr2 [7485484, 7485484]      *
#   -------
#   seqinfo: 2 sequences from an unspecified genome; no seqlengths

Then:

getSeq(Hsapiens,gr1)
#   A DNAStringSet instance of length 3
#     width seq
# [1]     1 A
# [2]     1 A
# [3]     1 C

H.

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.38.0                  
 [3] rtracklayer_1.30.1                Biostrings_2.38.3                
 [5] XVector_0.10.0                    GenomicRanges_1.22.3             
 [7] GenomeInfoDb_1.6.1                IRanges_2.4.6                    
 [9] S4Vectors_0.8.6                   BiocGenerics_0.16.1              

loaded via a namespace (and not attached):
 [1] XML_3.98-1.3               Rsamtools_1.22.0          
 [3] GenomicAlignments_1.6.2    bitops_1.0-6              
 [5] futile.options_1.0.0       zlibbioc_1.16.0           
 [7] futile.logger_1.4.1        lambda.r_1.1.7            
 [9] BiocParallel_1.4.3         tools_3.2.3               
[11] Biobase_2.30.0             RCurl_1.95-4.7            
[13] SummarizedExperiment_1.0.2
ADD REPLYlink written 2.3 years ago by Hervé Pagès ♦♦ 13k
0
gravatar for tangming2005
2.3 years ago by
tangming200590
United States
tangming200590 wrote:

I had a similar problem, and it looks like there is a bug in IRanges C: IRanges/ShortRead causing crash?

After updating the packages, getSeq function now works, but the following give me a similar error as this post.

library(GenomicFeatures)
hg19.refseq.db <- makeTxDbFromUCSC(genome="hg19", table="refGene")
refseq.genes<- genes(hg19.refseq.db)

>breakpoint

GRanges object with 10 ranges and 1 metadata column:
       seqnames                 ranges strand |     value
          <Rle>              <IRanges>  <Rle> | <logical>
   [1]     chr1 [  9688116,   9688117]      - |      <NA>
   [2]     chr1 [  9881194,   9881195]      + |      <NA>
   [3]     chr1 [ 48608805,  48608825]      - |      <NA>
   [4]     chr1 [191533733, 191533742]      + |      <NA>
   [5]    chr10 [  6325494,   6325511]      - |      <NA>
   [6]    chr10 [  8130950,   8130958]      - |      <NA>
   [7]    chr10 [  8289372,   8289378]      + |      <NA>
   [8]    chr10 [ 34802689,  34802698]      + |      <NA>
   [9]    chr10 [ 55170266,  55170285]      + |      <NA>
  [10]    chr10 [ 61358365,  61358385]      + |      <NA>


hits <- findOverlaps(breakpoint, refseq.genes, select="arbitrary", ignore.strand=TRUE)

> hits
 [1]    NA  9343 14103  2797    NA    NA    NA 18139    NA    NA

> breakpoint[hits]
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : 
  subscript contains NAs or out-of-bounds indices

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.1 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bumphunter_1.10.0                 locfit_1.5-9.1                    iterators_1.0.8                  
 [4] foreach_1.4.3                     GenomicFeatures_1.22.7            AnnotationDbi_1.32.3             
 [7] Biobase_2.30.0                    BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.38.0                  
[10] rtracklayer_1.30.1                Biostrings_2.38.3                 XVector_0.10.0                   
[13] ggplot2_2.0.0                     tidyr_0.3.1                       dplyr_0.4.3                      
[16] GenomicInteractions_1.4.1         GenomicRanges_1.22.2              GenomeInfoDb_1.6.1               
[19] IRanges_2.4.6                     S4Vectors_0.8.5                   BiocGenerics_0.16.1              

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.2                biovizBase_1.18.0          lattice_0.20-33           
 [4] Rsamtools_1.22.0           assertthat_0.1             digest_0.6.8              
 [7] R6_2.1.1                   plyr_1.8.3                 chron_2.3-47              
[10] futile.options_1.0.0       acepack_1.3-3.3            RSQLite_1.0.0             
[13] zlibbioc_1.16.0            data.table_1.9.6           rpart_4.1-10              
[16] labeling_0.3               splines_3.2.2              BiocParallel_1.4.3        
[19] stringr_1.0.0              foreign_0.8-66             igraph_1.0.1              
[22] RCurl_1.95-4.7             biomaRt_2.26.1             munsell_0.4.2             
[25] Gviz_1.14.0                pkgmaker_0.22              nnet_7.3-11               
[28] SummarizedExperiment_1.0.2 gridExtra_2.0.0            Hmisc_3.17-1              
[31] codetools_0.2-14           matrixStats_0.50.1         XML_3.98-1.3              
[34] GenomicAlignments_1.6.1    bitops_1.0-6               grid_3.2.2                
[37] xtable_1.8-0               registry_0.3               gtable_0.1.2              
[40] DBI_0.3.1                  magrittr_1.5               scales_0.3.0              
[43] stringi_1.0-1              doRNG_1.6                  latticeExtra_0.6-26       
[46] futile.logger_1.4.1        Formula_1.2-1              lambda.r_1.1.7            
[49] RColorBrewer_1.1-2         tools_3.2.2                dichromat_2.0-0           
[52] rngtools_1.2.4             survival_2.38-3            colorspace_1.2-6          
[55] cluster_2.0.3              VariantAnnotation_1.16.4  

 

 

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by tangming200590
2

This is just as the error says, you cannot subscript a GRanges or really any Vector with NAs. It is a different problem from the original post.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Michael Lawrence10.0k
0
gravatar for komal.rathi
2.3 years ago by
komal.rathi40
United States
komal.rathi40 wrote:

I updated GenomicRanges from GenomicRanges_1.22.1 to GenomicRanges_1.22.2 and it worked. Thanks!

ADD COMMENTlink written 2.3 years ago by komal.rathi40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 191 users visited in the last hour