export.bw produces enormous files (10^5 times bigger than they should be)
2
1
Entering edit mode
koneill ▴ 30
@koneill-8031
Last seen 4.0 years ago
Canada

Hi

I was trying to write out some tracks of averaged methylation data (represented as GenomicRanges objects) as bigWig files using export.bw, but for my simple 1Mbp window data (2,700 ranges), it produced a 2GB file!

Testing with the not-run example in the documentation, this holds as well. The example object, with only 9 ranges, produces a 268KB bigWig file:

  test_path <- system.file("tests", package = "rtracklayer")
  test_bw <- file.path(test_path, "test.bw")

  ## GRanges
  ## Returns ranges with non-zero scores.
  gr <- import(test_bw)
  gr

  which <- GRanges(c("chr2", "chr2"), IRanges(c(1, 300), c(400, 1000)))
  import(test_bw, which = which)

  ## RleList
  ## Scores returned as an RleList is equivalent to the coverage.
  ## Best option when 'which' or 'selection' contain many small ranges.
  mini <- narrow(unlist(tile(which, 50)), 2)
  rle <- import(test_bw, which = mini, as = "RleList")
  rle

  ## NumericList
  ## The 'which' is stored as metadata:
  track <- import(test_bw, which = which, as = "NumericList")
  metadata(track)

## Not run:
  test_bw_out <- file.path(tempdir(), "test_out.bw")
  export(gr, test_bw_out) #Note that I had to modify this to use gr since test doesn't exist

 

I understand that there should be some overhead for indexing, but this seems excessive. Indeed, when I export the same object as .bedGraph, it comes out to 168B. When I convert that file to bigWig using bedGraphToBigWig from the Kent tools, it comes to around 19KB.

 

Similarly, for the 2,700-range object I have, the bedGraph file is only 108KB, while the bigWig from bedGraphToBigWig is 79KB, not 2GB.

 

I cannot imagine this is working as intended?

rtracklayer bigwig • 2.3k views
ADD COMMENT
0
Entering edit mode

Ok, I tweaked things, see my answer.

ADD REPLY
2
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States

It's probably just generating a huge amount of summary information. Since we typically use bigwigs for big data, the fixed summaries are relatively small. I will see whether dynamically computed summaries are more efficient in this case.

Yes, by tweaking the indexing parameters and computing the summary levels dynamically, we can achieve the same file size as the UCSC tools. That is now the default behavior. There is an argument for computing the old fixed summaries. Will be in version 1.31.6.

ADD COMMENT
0
Entering edit mode

Thanks, Michael!

ADD REPLY
0
Entering edit mode
@laurentlacroix-9663
Last seen 6.3 years ago
France/Paris/INSERM

I am coming across a similar problem. I was using rtracklayer to export GRanges to bigwig and used to get a file of 17M for a GRanges like this one

            seqnames         ranges strand   |     score
               <Rle>      <IRanges>  <Rle>   | <integer>
        [1]     chr1   [   1, 1000]      *   |         0
        [2]     chr1   [1001, 2000]      *   |         0
        [3]     chr1   [2001, 3000]      *   |         0
        [4]     chr1   [3001, 4000]      *   |         0
        [5]     chr1   [4001, 5000]      *   |         0
        ...      ...            ...    ... ...       ...
  [3095702]     chrM [12001, 13000]      *   |         1
  [3095703]     chrM [13001, 14000]      *   |         2
  [3095704]     chrM [14001, 15000]      *   |         2
  [3095705]     chrM [15001, 16000]      *   |         1
  [3095706]     chrM [16001, 16571]      *   |         1
  -------
  seqinfo: 25 sequences (1 circular) from hg19 genome

This was back in april 2015. SInce June 2015, I did probably update my R installation to 3.2 and rtracklayer to 1.30 (I don't remember), but now the same bigwig has a 1.4G size...

I tried to create the same file on a cluster running R 3.1 or on an older computer for which I did not update R (still 3.1) and the file were also with a size of 17M. I did update the R version on the old computer to 3.2 and the size come again to 1.4G...

Should I wait for the new rtracklayer release or could I downgrad rtracklayer to 1.26 ?

Thanks in advance

Laurent

 

 

 

 

ADD COMMENT
0
Entering edit mode

I went ahead and ported a simpler version of the fix to release, will come with version 1.30.2.

ADD REPLY
0
Entering edit mode

Thanks. I updated rtracklayer to 1.30.2 but the issue remains the same. The output file is still 1.47Gb vs 17Mb for the same export on R 3.1.3 with rtracklayer 1.26. The procedure take also more time than with the 1.26 version on R 3.1.3. I tried on a smaller GRanges object (17 ranges). The difference is smaller, but the exported file is 283kb in R 3.2 vs 74kb in R 3.1.

ADD REPLY
0
Entering edit mode

Please post a reproducible example. The code from the original poster seems to result in a much smaller footprint in recent versions.

ADD REPLY
0
Entering edit mode

OK, I need to read the FAQ more in detail to learn how to post the larger GRanges file. But in the mean time I messed around my R and I had to reinstall it as well as bioconductor and tada! the exported file has now a size of 14.9Mb! Sorry, the problem did certainly come from my system and some mistake while updating R from 3.1 to 3.2. Sorry again and thanks for the help

 

ADD REPLY
0
Entering edit mode

OK, I need to read the FAQ more in detail to learn how to post the larger GRanges file. But in the mean time I messed around my R and I had to reinstall it as well as bioconductor and tada! the exported file has now a size of 14.9Mb! Sorry, the problem did certainly come from my system and some mistake while updating R from 3.1 to 3.2. Sorry again and thanks for the help

 

ADD REPLY

Login before adding your answer.

Traffic: 575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6