Hi,
I know that you can use rtracklayer to read bigwig files on the web. At least I have an example on data hosted via http where it works (shown below). But I haven't been able to load the following file https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw which is available via https. I wonder if it's possible to do this or not. I couldn't find information about this when looking at the help pages, googling a bit and looking at some of the posts with the rtracklayer tag. Maybe it could be possible via httr::write_stream, although I do see at ?import.bw that connections are not supported, which is probably why base::url doesn't work.
Here is the code:
> library(rtracklayer) > x <- import.bw('https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw', as = 'RleList') Error in seqinfo(ranges) : UCSC library operation failed In addition: Warning message: In seqinfo(ranges) : No openssl available in netConnectHttps for s3-eu-west-1.amazonaws.com : 443 > traceback() 13: .Call(BWGFile_seqlengths, path.expand(path(x))) 12: seqinfo(ranges) 11: seqinfo(ranges) 10: BigWigSelection(which, ...) 9: .class1(object) 8: as(selection, "BigWigSelection") 7: .local(con, format, text, ...) 6: import(FileForFormat(con, format), ...) 5: import(FileForFormat(con, format), ...) 4: import(con, "BigWig", ...) 3: import(con, "BigWig", ...) 2: import.bw("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw", as = "RleList") 1: import.bw("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw", as = "RleList")
I know it works, for example:
> library(derfinderData) > brainspanPheno$file[1] [1] "http://download.alleninstitute.org/brainspan/MRF_BigWig_Gencode_v10/bigwig/HSB97.AMY.bw" > y <- import.bw(brainspanPheno$file[1], as = 'RleList') > class(y) [1] "SimpleRleList" attr(,"package") [1] "IRanges"
The error message Error in seqinfo(ranges) : UCSC library operation failed lead me to https://github.com/Bioconductor-mirror/rtracklayer/blob/6c01d3d1d6b4dd8afaea97ba89644b3173d2dfa2/R/bigWig.R#L224 It doesn't seem to matter whether I use the `selection` or `which` arguments.
# Sizes file at https://github.com/nellore/runs/blob/master/gtex/hg38.sizes
> chrinfo <- read.table('hg38.sizes', header = FALSE, sep = '\t', col.names = c('chr', 'length'))
> chrs <- paste0('chr', c(1:22, 'X', 'Y'))
> library(IRanges)
> ir <- IRanges(start = rep(1, length(chrs)), end = chrinfo$length[match(chrs, chrinfo$chr)], names = chrs)
> bw_selection <- BigWigSelection(ir)
> bw <- import.bw('https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw', format = 'bw', selection = bw_selection, as = 'RleList')
Error in seqinfo(con) : UCSC library operation failed
In addition: Warning message:
In seqinfo(con) :
No openssl available in netConnectHttps for s3-eu-west-1.amazonaws.com : 443
> bw <- import.bw('https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw', format = 'bw', selection = bw_selection, as = 'RleList', which = GRanges(ir, seqnames = chrs))
Error in seqinfo(con) : UCSC library operation failed
In addition: Warning message:
In seqinfo(con) :
No openssl available in netConnectHttps for s3-eu-west-1.amazonaws.com : 443
The error comes down to `seqinfo()` as shown below:
> seqinfo(BigWigFile('https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw')) Error in seqinfo(BigWigFile("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw")) : UCSC library operation failed In addition: Warning message: In seqinfo(BigWigFile("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw")) : No openssl available in netConnectHttps for s3-eu-west-1.amazonaws.com : 443 > traceback() 3: .Call(BWGFile_seqlengths, path.expand(path(x))) 2: seqinfo(BigWigFile("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw")) 1: seqinfo(BigWigFile("https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/4976639/HSB92.bw")) > seqinfo(BigWigFile('http://download.alleninstitute.org/brainspan/MRF_BigWig_Gencode_v10/bigwig/HSB97.AMY.bw')) Seqinfo object with 25 sequences from an unspecified genome: seqnames seqlengths isCircular genome chr1 249250621 <NA> <NA> chr10 135534747 <NA> <NA> chr11 135006516 <NA> <NA> chr12 133851895 <NA> <NA> chr13 115169878 <NA> <NA> ... ... ... ... chr8 146364022 <NA> <NA> chr9 141213431 <NA> <NA> chrM 16571 <NA> <NA> chrX 155270560 <NA> <NA> chrY 59373566 <NA> <NA>
> options(width = 120)
> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.0 alpha (2016-03-23 r70368)
system x86_64, darwin13.4.0
ui AQUA
language (EN)
collate en_US.UTF-8
tz America/New_York
date 2016-04-25
Packages ---------------------------------------------------------------------------------------------------------------
package * version date source
Biobase 2.31.3 2016-01-14 Bioconductor
BiocGenerics * 0.17.5 2016-04-11 Bioconductor
BiocParallel 1.5.21 2016-03-23 Bioconductor
Biostrings 2.39.14 2016-04-14 Bioconductor
bitops 1.0-6 2013-08-17 CRAN (R 3.3.0)
devtools 1.11.0 2016-04-12 CRAN (R 3.3.0)
digest 0.6.9 2016-01-08 CRAN (R 3.3.0)
GenomeInfoDb * 1.7.6 2016-01-29 Bioconductor
GenomicAlignments 1.7.21 2016-04-12 Bioconductor
GenomicRanges * 1.23.27 2016-04-12 Bioconductor
IRanges * 2.5.46 2016-04-17 Bioconductor
memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
RCurl 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
Rsamtools 1.23.8 2016-04-10 Bioconductor
rtracklayer * 1.31.10 2016-04-07 Bioconductor
S4Vectors * 0.9.51 2016-04-17 Bioconductor
SummarizedExperiment 1.1.24 2016-04-11 Bioconductor
withr 1.0.1 2016-02-04 CRAN (R 3.3.0)
XML 3.98-1.4 2016-03-01 CRAN (R 3.3.0)
XVector 0.11.8 2016-04-06 Bioconductor
zlibbioc 1.17.1 2016-03-19 Bioconductor
Thanks,
Leo
Hm... I guess that I should have just tried the obvious thing of switching the url from https to http first. That did work in this case:
But I don't know if the option of changing the url will always be available. By that I mean beyond s3 servers.
Well, I guess if anyone does end up needing it, I could look into enabling SSL support in the Kent library. It would complicate the build process, because it would need to detect openssl.
I made this change in devel. Seems to work on the URL you gave above.
Ok! I gave it a go but couldn't get it to work. I'm guessing that I have to compile my own R or maybe I'm missing modifying some environmental variable.
When installing rtracklayer from source I see a message about not finding openssl, even after installing the latest version via homebrew and making sure that it's the one on the path. The homebrew message does mention something about LDFLAGS and CPPFLAGS which is what leads me to thinking that I might have to compile my own R.
I had to remove some pieces of the output to keep it under 5000 characters...
You also need to install pkg-config so that it can find the installation. I guess I could special case it for homebrew, but hopefully the homebrew pkg-config will figure everything out for now.
Thanks Michael! I didn't know about pkg-config. That got me closer, but still not there yet.
GenomeInfoDb version 1.7.6 is the latest one from what I see at https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GenomeInfoDb/DESCRIPTION.
Looks like OS X throws a SIGPIPE in a place that Linux does not. The UCSC code does try to protect against it, but not enough, I guess. R registers its own handler at initialization, so I've tapped into that to ignore SIGPIPE during any UCSC operation. Please let me know if you get further.