Question

NAs values when importing BED file using import function

0

Entering edit mode

Dimitris Polychronopoulos ▴ 80

@dimitris-polychronopoulos-9192

Last seen 8.5 years ago

United Kingdom

Dear all,

I am trying to import a BED file using rtracklayer's import function (the same problem appears with import.bed) in order to use getSeq function at a later stage and retrieve FASTA entries of this bed file (loaded as a GRanges object).

When I run getSeq i get this error:

getSeq(hg38, testbed)
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : 
  subscript contains NAs or out-of-bounds indices

So, I just went back and checked this imported as GRanges object testbed for NAs and there are a lot of such values present!

Any suggestions?

I am new to R and rtracklayer, but have used the getSeq function successfully before.

My R session is:

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
 [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] stats4    grid      parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rtracklayer_1.30.1   Gviz_1.14.0          GenomicRanges_1.22.1 GenomeInfoDb_1.6.1   IRanges_2.4.2       
 [6] S4Vectors_0.8.3      genomation_1.2.1     CNEr_1.6.1           AnnotationHub_2.2.2  BiocGenerics_0.16.1 
[11] BiocInstaller_1.20.1

loaded via a namespace (and not attached):
 [1] Biobase_2.30.0               httr_1.0.0                   splines_3.2.2               
 [4] Formula_1.2-1                shiny_0.12.2                 interactiveDisplayBase_1.8.0
 [7] latticeExtra_0.6-26          BSgenome_1.38.0              Rsamtools_1.22.0            
[10] impute_1.44.0                RSQLite_1.0.0                lattice_0.20-33             
[13] biovizBase_1.18.0            chron_2.3-47                 digest_0.6.8                
[16] RColorBrewer_1.1-2           XVector_0.10.0               colorspace_1.2-6            
[19] htmltools_0.2.6              httpuv_1.3.3                 plyr_1.8.3                  
[22] XML_3.98-1.3                 biomaRt_2.26.0               zlibbioc_1.16.0             
[25] xtable_1.8-0                 scales_0.3.0                 BiocParallel_1.4.0          
[28] ggplot2_1.0.1                seqPattern_1.2.0             SummarizedExperiment_1.0.1  
[31] GenomicFeatures_1.22.4       nnet_7.3-11                  proto_0.3-10                
[34] survival_2.38-3              magrittr_1.5                 mime_0.4                    
[37] MASS_7.3-45                  foreign_0.8-66               tools_3.2.2                 
[40] data.table_1.9.6             matrixStats_0.15.0           gridBase_0.4-7              
[43] stringr_1.0.0                munsell_0.4.2                cluster_2.0.3               
[46] plotrix_3.6                  AnnotationDbi_1.32.0         lambda.r_1.1.7              
[49] Biostrings_2.38.1            futile.logger_1.4.1          RCurl_1.95-4.7              
[52] dichromat_2.0-0              VariantAnnotation_1.16.3     bitops_1.0-6                
[55] gtable_0.1.2                 DBI_0.3.1                    reshape2_1.4.1              
[58] R6_2.1.1                     GenomicAlignments_1.6.1      gridExtra_2.0.0             
[61] Hmisc_3.17-0                 futile.options_1.0.0         KernSmooth_2.23-15          
[64] readr_0.2.2                  stringi_1.0-1                Rcpp_0.12.2                 
[67] rpart_4.1-10                 acepack_1.3-3.3

rtracklayer NA BED getseq • 1.9k views

ADD COMMENT • link 10.2 years ago Dimitris Polychronopoulos ▴ 80

score 0 · Answer 1 · 2015-11-21

Update your packages using BiocInstaller::biocLite(); your IRanges version has a bug that has since been fixed. I'm not sure that updating the package will solve your problems entirely. The NA's message would refer to the granges part of the GRanges, not the mcols() part; the other part of the error message (out-of-bounds) might occur if the granges() were negative or longer than the sequences themselves; after running example(import.bed) I did

gr = import(test_bed)
granges(gr)
grl = splitAsList(gr, seqnames(gr))
min(start(grl))
max(end(grl))

score 0 · Answer 2 · 2015-11-21

Thanks a lot for your kind and prompt reply, Martin.

I have updated bioconductor and tried your suggestions but still it doesn't work for me. In the meantime I also tried at my local machine and it worked well as expected. Do not know what was the problem still, but for the record this is the current session at my machine:

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome_1.38.0
[3] rtracklayer_1.30.1                GenomicRanges_1.22.1
[5] GenomeInfoDb_1.6.1                BiocInstaller_1.20.1
[7] Biostrings_2.38.1                 XVector_0.10.0
[9] IRanges_2.4.3                     S4Vectors_0.8.3
[11] BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] XML_3.98-1.3               Rsamtools_1.22.0           GenomicAlignments_1.6.1
[4] bitops_1.0-6               futile.options_1.0.0       zlibbioc_1.16.0
[7] futile.logger_1.4.1        lambda.r_1.1.7             BiocParallel_1.4.0
[10] tools_3.2.2                Biobase_2.30.0             RCurl_1.95-4.7
[13] SummarizedExperiment_1.0.1