Is it possible to read bigwig files with rtracklayer over https? Comes down to GenomeInfoDb::seqinfo()
1
1
Entering edit mode
@lcolladotor
Last seen 4 weeks ago
United States

Hi,

I know that you can use rtracklayer to read bigwig files on the web. At least I have an example on data hosted via http where it works (shown below). But I haven't been able to load the following file https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw which is available via https. I wonder if it's possible to do this or not. I couldn't find information about this when looking at the help pages, googling a bit and looking at some of the posts with the rtracklayer tag. Maybe it could be possible via httr::write_stream, although I do see at ?import.bw that connections are not supported, which is probably why base::url doesn't work.

Here is the code:

> library(rtracklayer)
> x <- import.bw('https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw', as = 'RleList')
Error in seqinfo(ranges) : UCSC library operation failed
In addition: Warning message:
In seqinfo(ranges) :
  No openssl available in netConnectHttps for s3-eu-west-1.amazonaws.com : 443
> traceback()
13: .Call(BWGFile_seqlengths, path.expand(path(x)))
12: seqinfo(ranges)
11: seqinfo(ranges)
10: BigWigSelection(which, ...)
9: .class1(object)
8: as(selection, "BigWigSelection")
7: .local(con, format, text, ...)
6: import(FileForFormat(con, format), ...)
5: import(FileForFormat(con, format), ...)
4: import(con, "BigWig", ...)
3: import(con, "BigWig", ...)
2: import.bw("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw", 
       as = "RleList")
1: import.bw("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw", 
       as = "RleList")


I know it works, for example:

> library(derfinderData)
> brainspanPheno$file[1]
[1] "http://download.alleninstitute.org/brainspan/MRF_BigWig_Gencode_v10/bigwig/HSB97.AMY.bw"
> y <- import.bw(brainspanPheno$file[1], as = 'RleList')
> class(y)
[1] "SimpleRleList"
attr(,"package")
[1] "IRanges"

 

The error message Error in seqinfo(ranges) : UCSC library operation failed lead me to https://github.com/Bioconductor-mirror/rtracklayer/blob/6c01d3d1d6b4dd8afaea97ba89644b3173d2dfa2/R/bigWig.R#L224 It doesn't seem to matter whether I use the `selection` or `which` arguments.

# Sizes file at https://github.com/nellore/runs/blob/master/gtex/hg38.sizes
> chrinfo <- read.table('hg38.sizes', header = FALSE, sep = '\t', col.names = c('chr', 'length'))
> chrs <- paste0('chr', c(1:22, 'X', 'Y'))
> library(IRanges)
> ir <- IRanges(start = rep(1, length(chrs)), end = chrinfo$length[match(chrs, chrinfo$chr)], names = chrs)
> bw_selection <- BigWigSelection(ir)
> bw <- import.bw('https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw', format = 'bw', selection = bw_selection, as = 'RleList')
Error in seqinfo(con) : UCSC library operation failed
In addition: Warning message:
In seqinfo(con) :
  No openssl available in netConnectHttps for s3-eu-west-1.amazonaws.com : 443
> bw <- import.bw('https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw', format = 'bw', selection = bw_selection, as = 'RleList', which = GRanges(ir, seqnames = chrs))
Error in seqinfo(con) : UCSC library operation failed
In addition: Warning message:
In seqinfo(con) :
  No openssl available in netConnectHttps for s3-eu-west-1.amazonaws.com : 443

The error comes down to `seqinfo()` as shown below:

> seqinfo(BigWigFile('https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw'))
Error in seqinfo(BigWigFile("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw")) : 
  UCSC library operation failed
In addition: Warning message:
In seqinfo(BigWigFile("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw")) :
  No openssl available in netConnectHttps for s3-eu-west-1.amazonaws.com : 443
> traceback()
3: .Call(BWGFile_seqlengths, path.expand(path(x)))
2: seqinfo(BigWigFile("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw"))
1: seqinfo(BigWigFile("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw"))
> seqinfo(BigWigFile('http://download.alleninstitute.org/brainspan/MRF_BigWig_Gencode_v10/bigwig/HSB97.AMY.bw'))
Seqinfo object with 25 sequences from an unspecified genome:
  seqnames seqlengths isCircular genome
  chr1      249250621       <NA>   <NA>
  chr10     135534747       <NA>   <NA>
  chr11     135006516       <NA>   <NA>
  chr12     133851895       <NA>   <NA>
  chr13     115169878       <NA>   <NA>
  ...             ...        ...    ...
  chr8      146364022       <NA>   <NA>
  chr9      141213431       <NA>   <NA>
  chrM          16571       <NA>   <NA>
  chrX      155270560       <NA>   <NA>
  chrY       59373566       <NA>   <NA>

 

> options(width = 120)
> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------
 setting  value                                    
 version  R version 3.3.0 alpha (2016-03-23 r70368)
 system   x86_64, darwin13.4.0                     
 ui       AQUA                                     
 language (EN)                                     
 collate  en_US.UTF-8                              
 tz       America/New_York                         
 date     2016-04-25                               

Packages ---------------------------------------------------------------------------------------------------------------
 package              * version  date       source        
 Biobase                2.31.3   2016-01-14 Bioconductor  
 BiocGenerics         * 0.17.5   2016-04-11 Bioconductor  
 BiocParallel           1.5.21   2016-03-23 Bioconductor  
 Biostrings             2.39.14  2016-04-14 Bioconductor  
 bitops                 1.0-6    2013-08-17 CRAN (R 3.3.0)
 devtools               1.11.0   2016-04-12 CRAN (R 3.3.0)
 digest                 0.6.9    2016-01-08 CRAN (R 3.3.0)
 GenomeInfoDb         * 1.7.6    2016-01-29 Bioconductor  
 GenomicAlignments      1.7.21   2016-04-12 Bioconductor  
 GenomicRanges        * 1.23.27  2016-04-12 Bioconductor  
 IRanges              * 2.5.46   2016-04-17 Bioconductor  
 memoise                1.0.0    2016-01-29 CRAN (R 3.3.0)
 RCurl                  1.95-4.8 2016-03-01 CRAN (R 3.3.0)
 Rsamtools              1.23.8   2016-04-10 Bioconductor  
 rtracklayer          * 1.31.10  2016-04-07 Bioconductor  
 S4Vectors            * 0.9.51   2016-04-17 Bioconductor  
 SummarizedExperiment   1.1.24   2016-04-11 Bioconductor  
 withr                  1.0.1    2016-02-04 CRAN (R 3.3.0)
 XML                    3.98-1.4 2016-03-01 CRAN (R 3.3.0)
 XVector                0.11.8   2016-04-06 Bioconductor  
 zlibbioc               1.17.1   2016-03-19 Bioconductor

 

Thanks,

Leo

rtracklayer genomeinfodb • 4.9k views
ADD COMMENT
0
Entering edit mode

Hm... I guess that I should have just tried the obvious thing of switching the url from https to http first. That did work in this case:

> import(BigWigFile('http://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw'), as = 'RleList')
RleList of length 1
$chr21
numeric-Rle of length 48129895 with 213880 runs
  Lengths:            9451759                 75              73452 ...                  1                  4              17157
  Values :                  0 0.0299999993294477                  0 ... 0.0700000002980232 0.0299999993294477                  0

 

But I don't know if the option of changing the url will always be available. By that I mean beyond s3 servers.

 

ADD REPLY
0
Entering edit mode

Well, I guess if anyone does end up needing it, I could look into enabling SSL support in the Kent library. It would complicate the build process, because it would need to detect openssl.

ADD REPLY
1
Entering edit mode

I made this change in devel. Seems to work on the URL you gave above.
 

ADD REPLY
0
Entering edit mode

Ok! I gave it a go but couldn't get it to work. I'm guessing that I have to compile my own R or maybe I'm missing modifying some environmental variable.

When installing rtracklayer from source I see a message about not finding openssl, even after installing the latest version via homebrew and making sure that it's the one on the path. The homebrew message does mention something about LDFLAGS and CPPFLAGS which is what leads me to thinking that I might have to compile my own R.

 

## Installed the latest openssl following http://stackoverflow.com/questions/15185661/update-openssl-on-os-x-with-homebrew
$ brew install openssl
...
==> Caveats
A CA file has been bootstrapped using certificates from the system
keychain. To add additional certificates, place .pem files in
  /usr/local/etc/openssl/certs

and run
  /usr/local/opt/openssl/bin/c_rehash

This formula is keg-only, which means it was not symlinked into /usr/local.

Apple has deprecated use of OpenSSL in favor of its own TLS and crypto libraries

Generally there are no consequences of this for you. If you build your
own software and it requires this formula, you'll need to add to your
build variables:

    LDFLAGS:  -L/usr/local/opt/openssl/lib
    CPPFLAGS: -I/usr/local/opt/openssl/include
...

$ brew link --force openssl
Linking /usr/local/Cellar/openssl/1.0.2g... 1588 symlinks created
$ openssl version
OpenSSL 1.0.2g  1 Mar 2016

## Get latest rtracklayer
$ svn co https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/rtracklayer

## Install from source
$ R CMD INSTALL rtracklayer
Loading required package: colorout
* installing to library ‘/Library/Frameworks/R.framework/Versions/3.3/Resources/library’
* installing *source* package ‘rtracklayer’ ...
checking for pkg-config... no
checking for OPENSSL... no
configure: creating ./config.status
config.status: creating src/Makevars.common
** libs
make: Nothing to be done for `all'.
installing to /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rtracklayer/libs
** R
** data
** demo
** inst
** preparing package for lazy loading
Creating a generic function for ‘offset’ from package ‘stats’ in package ‘rtracklayer’
Creating a generic function from function ‘uri’ in package ‘rtracklayer’
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
Loading required package: colorout
* DONE (rtracklayer)


## Try the https url above 
> library(rtracklayer)
> x <- import.bw('https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw', as = 'RleList')
Error in seqinfo(ranges) : UCSC library operation failed
In addition: Warning message:
In seqinfo(ranges) :
  No openssl available in netConnectHttps for s3-eu-west-1.amazonaws.com : 443
> options(width = 120)
> devtools::session_info()
Session info ---
 setting  value                                    
 version  R version 3.3.0 alpha (2016-03-23 r70368)
 system   x86_64, darwin13.4.0                     
 ui       AQUA                                     
 language (EN)                                     
 collate  en_US.UTF-8                              
 tz       America/New_York                         
 date     2016-04-30                               

Packages ---
 package              * version  date       source        
 Biobase                2.31.3   2016-01-14 Bioconductor  
 BiocGenerics         * 0.17.5   2016-04-11 Bioconductor  
 BiocParallel           1.5.22   2016-04-27 Bioconductor  
 Biostrings             2.39.14  2016-04-14 Bioconductor  
 bitops                 1.0-6    2013-08-17 CRAN (R 3.3.0)
 devtools               1.11.1   2016-04-21 CRAN (R 3.3.0)
 digest                 0.6.9    2016-01-08 CRAN (R 3.3.0)
 GenomeInfoDb         * 1.7.6    2016-01-29 Bioconductor  
 GenomicAlignments      1.7.21   2016-04-12 Bioconductor  
 GenomicRanges        * 1.23.27  2016-04-12 Bioconductor  
 IRanges              * 2.5.46   2016-04-17 Bioconductor  
 memoise                1.0.0    2016-01-29 CRAN (R 3.3.0)
 RCurl                  1.95-4.8 2016-03-01 CRAN (R 3.3.0)
 Rsamtools              1.23.11  2016-04-28 Bioconductor  
 rtracklayer          * 1.31.11  2016-04-30 Bioconductor  
 S4Vectors            * 0.9.52   2016-04-27 Bioconductor  
 SummarizedExperiment   1.1.27   2016-04-27 Bioconductor  
 withr                  1.0.1    2016-02-04 CRAN (R 3.3.0)
 XML                    3.98-1.4 2016-03-01 CRAN (R 3.3.0)
 XVector                0.11.8   2016-04-06 Bioconductor  
 zlibbioc               1.17.1   2016-03-19 Bioconductor 
ADD REPLY
0
Entering edit mode

I had to remove some pieces of the output to keep it under 5000 characters...

ADD REPLY
0
Entering edit mode

You also need to install pkg-config so that it can find the installation. I guess I could special case it for homebrew, but hopefully the homebrew pkg-config will figure everything out for now.
 

ADD REPLY
0
Entering edit mode

Thanks Michael! I didn't know about pkg-config. That got me closer, but still not there yet.

$ brew install pkg-config
==> Downloading https://homebrew.bintray.com/bottles/pkg-config-0.29.1.el_capita
######################################################################## 100.0%
==> Pouring pkg-config-0.29.1.el_capitan.bottle.tar.gz
🍺  /usr/local/Cellar/pkg-config/0.29.1: 10 files, 627.2K

$ svn co https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/rtracklayer

$ R CMD INSTALL rtracklayer
Loading required package: colorout
* installing to library ‘/Library/Frameworks/R.framework/Versions/3.3/Resources/library’
* installing *source* package ‘rtracklayer’ ...
checking for pkg-config... /usr/local/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for OPENSSL... yes

## suppressed more output

> library(rtracklayer)
> x <- import.bw('https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw', as = 'RleList')
Error in seqinfo(ranges) : ignoring SIGPIPE signal
> traceback()
13: .Call(BWGFile_seqlengths, path.expand(path(x)))
12: seqinfo(ranges)
11: seqinfo(ranges)
10: BigWigSelection(which, ...)
9: .class1(object)
8: as(selection, "BigWigSelection")
7: .local(con, format, text, ...)
6: import(FileForFormat(con, format), ...)
5: import(FileForFormat(con, format), ...)
4: import(con, "BigWig", ...)
3: import(con, "BigWig", ...)
2: import.bw("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw", 
       as = "RleList")
1: import.bw("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw", 
       as = "RleList")
> options(width = 120)
> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------
 setting  value                                    
 version  R version 3.3.0 alpha (2016-03-23 r70368)
 system   x86_64, darwin13.4.0                     
 ui       AQUA                                     
 language (EN)                                     
 collate  en_US.UTF-8                              
 tz       America/New_York                         
 date     2016-04-30                               

Packages ---------------------------------------------------------------------------------------------------------------
 package              * version  date       source        
 Biobase                2.31.3   2016-01-14 Bioconductor  
 BiocGenerics         * 0.17.5   2016-04-11 Bioconductor  
 BiocParallel           1.5.22   2016-04-27 Bioconductor  
 Biostrings             2.39.14  2016-04-14 Bioconductor  
 bitops                 1.0-6    2013-08-17 CRAN (R 3.3.0)
 devtools               1.11.1   2016-04-21 CRAN (R 3.3.0)
 digest                 0.6.9    2016-01-08 CRAN (R 3.3.0)
 GenomeInfoDb         * 1.7.6    2016-01-29 Bioconductor  
 GenomicAlignments      1.7.21   2016-04-12 Bioconductor  
 GenomicRanges        * 1.23.27  2016-04-12 Bioconductor  
 IRanges              * 2.5.46   2016-04-17 Bioconductor  
 memoise                1.0.0    2016-01-29 CRAN (R 3.3.0)
 RCurl                  1.95-4.8 2016-03-01 CRAN (R 3.3.0)
 Rsamtools              1.23.11  2016-04-28 Bioconductor  
 rtracklayer          * 1.31.11  2016-04-30 Bioconductor  
 S4Vectors            * 0.9.52   2016-04-27 Bioconductor  
 SummarizedExperiment   1.1.27   2016-04-27 Bioconductor  
 withr                  1.0.1    2016-02-04 CRAN (R 3.3.0)
 XML                    3.98-1.4 2016-03-01 CRAN (R 3.3.0)
 XVector                0.11.8   2016-04-06 Bioconductor  
 zlibbioc               1.17.1   2016-03-19 Bioconductor  
> 

 

GenomeInfoDb version 1.7.6  is the latest one from what I see at https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GenomeInfoDb/DESCRIPTION.

ADD REPLY
1
Entering edit mode

Looks like OS X throws a SIGPIPE in a place that Linux does not. The UCSC code does try to protect against it, but not enough, I guess. R registers its own handler at initialization, so I've tapped into that to ignore SIGPIPE during any UCSC operation. Please let me know if you get further.

ADD REPLY
1
Entering edit mode
@lcolladotor
Last seen 4 weeks ago
United States

Thanks to Michael Lawrence (see https://support.bioconductor.org/p/81267/#82138) I can verify that this now works on OS X and linux using rtracklayer version 1.31.12.

Best,

Leo

Log files for OS X and linux

 
ADD COMMENT

Login before adding your answer.

Traffic: 737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6