NAs values when importing BED file using import function
2
0
Entering edit mode
@dimitris-polychronopoulos-9192
Last seen 7.3 years ago
United Kingdom

Dear all,

I am trying to import a BED file using rtracklayer's import function (the same problem appears with import.bed) in order to use getSeq function at a later stage and retrieve FASTA entries of this bed file (loaded as a GRanges object).

When I run getSeq i get this error:

getSeq(hg38, testbed)
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : 
  subscript contains NAs or out-of-bounds indices

So, I just went back and checked this imported as GRanges object testbed for NAs and there are a lot of such values present!

Any suggestions?

I am new to R and rtracklayer, but have used the getSeq function successfully before.

My R session is:

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
 [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] stats4    grid      parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rtracklayer_1.30.1   Gviz_1.14.0          GenomicRanges_1.22.1 GenomeInfoDb_1.6.1   IRanges_2.4.2       
 [6] S4Vectors_0.8.3      genomation_1.2.1     CNEr_1.6.1           AnnotationHub_2.2.2  BiocGenerics_0.16.1 
[11] BiocInstaller_1.20.1

loaded via a namespace (and not attached):
 [1] Biobase_2.30.0               httr_1.0.0                   splines_3.2.2               
 [4] Formula_1.2-1                shiny_0.12.2                 interactiveDisplayBase_1.8.0
 [7] latticeExtra_0.6-26          BSgenome_1.38.0              Rsamtools_1.22.0            
[10] impute_1.44.0                RSQLite_1.0.0                lattice_0.20-33             
[13] biovizBase_1.18.0            chron_2.3-47                 digest_0.6.8                
[16] RColorBrewer_1.1-2           XVector_0.10.0               colorspace_1.2-6            
[19] htmltools_0.2.6              httpuv_1.3.3                 plyr_1.8.3                  
[22] XML_3.98-1.3                 biomaRt_2.26.0               zlibbioc_1.16.0             
[25] xtable_1.8-0                 scales_0.3.0                 BiocParallel_1.4.0          
[28] ggplot2_1.0.1                seqPattern_1.2.0             SummarizedExperiment_1.0.1  
[31] GenomicFeatures_1.22.4       nnet_7.3-11                  proto_0.3-10                
[34] survival_2.38-3              magrittr_1.5                 mime_0.4                    
[37] MASS_7.3-45                  foreign_0.8-66               tools_3.2.2                 
[40] data.table_1.9.6             matrixStats_0.15.0           gridBase_0.4-7              
[43] stringr_1.0.0                munsell_0.4.2                cluster_2.0.3               
[46] plotrix_3.6                  AnnotationDbi_1.32.0         lambda.r_1.1.7              
[49] Biostrings_2.38.1            futile.logger_1.4.1          RCurl_1.95-4.7              
[52] dichromat_2.0-0              VariantAnnotation_1.16.3     bitops_1.0-6                
[55] gtable_0.1.2                 DBI_0.3.1                    reshape2_1.4.1              
[58] R6_2.1.1                     GenomicAlignments_1.6.1      gridExtra_2.0.0             
[61] Hmisc_3.17-0                 futile.options_1.0.0         KernSmooth_2.23-15          
[64] readr_0.2.2                  stringi_1.0-1                Rcpp_0.12.2                 
[67] rpart_4.1-10                 acepack_1.3-3.3

 

 

 

 

 

rtracklayer NA BED getseq • 1.6k views
0
Entering edit mode
@martin-morgan-1513
Last seen 3 months ago
United States

Update your packages using BiocInstaller::biocLite(); your IRanges version has a bug that has since been fixed. I'm not sure that updating the package will solve your problems entirely. The NA's message would refer to the granges part of the GRanges, not the mcols() part; the other part of the error message (out-of-bounds) might occur if the granges() were negative or longer than the sequences themselves; after running example(import.bed) I did

gr = import(test_bed)
granges(gr)
grl = splitAsList(gr, seqnames(gr))
min(start(grl))
max(end(grl))
ADD COMMENT
0
Entering edit mode
@dimitris-polychronopoulos-9192
Last seen 7.3 years ago
United Kingdom

Thanks a lot for your kind and prompt reply, Martin.

I have updated bioconductor and tried your suggestions but still it doesn't work for me. In the meantime I also tried at my local machine and it worked well as expected. Do not know what was the problem still, but for the record this is the current session at my machine:

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome_1.38.0                  
 [3] rtracklayer_1.30.1                GenomicRanges_1.22.1             
 [5] GenomeInfoDb_1.6.1                BiocInstaller_1.20.1             
 [7] Biostrings_2.38.1                 XVector_0.10.0                   
 [9] IRanges_2.4.3                     S4Vectors_0.8.3                  
[11] BiocGenerics_0.16.1              

loaded via a namespace (and not attached):
 [1] XML_3.98-1.3               Rsamtools_1.22.0           GenomicAlignments_1.6.1   
 [4] bitops_1.0-6               futile.options_1.0.0       zlibbioc_1.16.0           
 [7] futile.logger_1.4.1        lambda.r_1.1.7             BiocParallel_1.4.0        
[10] tools_3.2.2                Biobase_2.30.0             RCurl_1.95-4.7            
[13] SummarizedExperiment_1.0.1

1
Entering edit mode

Maybe you could divide testbed in half, testbed[1:(length(testbed)/2)], and use getSeq() with that. Continue dividing until you have one or two ranges that generate the error, and then compare the granges() with the seqnames() and seqlengths() of  hg38.

ADD REPLY

Login before adding your answer.

Traffic: 541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6