rtracklayer v 1.53.1 refuses to load xz-compressed files.
0
0
Entering edit mode
Charles • 0
@charles-23991
Last seen 9 hours ago
Japan

Hello,

as I can open an xz-compressed GFF file with read.delim but not with rtracklayer::import, I suppose that there is a but in rtracklayer

The error message is:

Error in .normarg_input_filepath(filepath) : 
  file "test.gff3.xz" has unsupported type: xzfile

The session below look longish but I just:

  • import a GFF file
  • import a gz-compressed GFF file
  • fail to import an xz-compressed GFF file
  • read the xz-compressed GFF file as a data.frame with read.delim
R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library('rtracklayer')
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: IRanges
Loading required package: GenomeInfoDb
> import('test.gff3')
GRanges object with 1 range and 6 metadata columns:
      seqnames    ranges strand |   source     type     score     phase
         <Rle> <IRanges>  <Rle> | <factor> <factor> <numeric> <integer>
  [1]   ctg123 1000-9000      + |       NA     gene        NA      <NA>
               ID        Name
      <character> <character>
  [1]   gene00001        EDEN
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> import('test.gff3.gz')
GRanges object with 1 range and 6 metadata columns:
      seqnames    ranges strand |   source     type     score     phase
         <Rle> <IRanges>  <Rle> | <factor> <factor> <numeric> <integer>
  [1]   ctg123 1000-9000      + |       NA     gene        NA      <NA>
               ID        Name
      <character> <character>
  [1]   gene00001        EDEN
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> import('test.gff3.xz')
Error in .normarg_input_filepath(filepath) : 
  file "test.gff3.xz" has unsupported type: xzfile
> read.delim('test.gff3.xz', comment.char='#')
[1] ctg123                 .                      gene                  
[4] X1000                  X9000                  ..1                   
[7] X.                     ..2                    ID.gene00001.Name.EDEN
<0 rows> (or 0-length row.names)
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 11 (bullseye)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] rtracklayer_1.53.1   GenomicRanges_1.45.0 GenomeInfoDb_1.29.5 
[4] IRanges_2.27.2       S4Vectors_0.31.3     BiocGenerics_0.39.2 

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13             XVector_0.33.0             
 [3] zlibbioc_1.39.0             GenomicAlignments_1.29.0   
 [5] BiocParallel_1.27.4         lattice_0.20-44            
 [7] rjson_0.2.20                tools_4.1.1                
 [9] grid_4.1.1                  SummarizedExperiment_1.23.4
[11] parallel_4.1.1              Biobase_2.53.0             
[13] matrixStats_0.60.1          yaml_2.2.1                 
[15] crayon_1.4.1                BiocIO_1.3.0               
[17] Matrix_1.3-3                GenomeInfoDbData_1.2.6     
[19] restfulr_0.0.13             bitops_1.0-7               
[21] RCurl_1.98-1.4              DelayedArray_0.19.2        
[23] compiler_4.1.1              MatrixGenerics_1.5.4       
[25] Biostrings_2.61.2           Rsamtools_2.9.1            
[27] XML_3.99-0.7
rtracklayer • 39 views
ADD COMMENT

Login before adding your answer.

Traffic: 337 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6