Question

export.bw produces enormous files (10^5 times bigger than they should be)

1

Entering edit mode

koneill ▴ 30

@koneill-8031

Last seen 4.0 years ago

Canada

Hi

I was trying to write out some tracks of averaged methylation data (represented as GenomicRanges objects) as bigWig files using export.bw, but for my simple 1Mbp window data (2,700 ranges), it produced a 2GB file!

Testing with the not-run example in the documentation, this holds as well. The example object, with only 9 ranges, produces a 268KB bigWig file:

  test_path <- system.file("tests", package = "rtracklayer")
  test_bw <- file.path(test_path, "test.bw")

  ## GRanges
  ## Returns ranges with non-zero scores.
  gr <- import(test_bw)
  gr

  which <- GRanges(c("chr2", "chr2"), IRanges(c(1, 300), c(400, 1000)))
  import(test_bw, which = which)

  ## RleList
  ## Scores returned as an RleList is equivalent to the coverage.
  ## Best option when 'which' or 'selection' contain many small ranges.
  mini <- narrow(unlist(tile(which, 50)), 2)
  rle <- import(test_bw, which = mini, as = "RleList")
  rle

  ## NumericList
  ## The 'which' is stored as metadata:
  track <- import(test_bw, which = which, as = "NumericList")
  metadata(track)

## Not run:
  test_bw_out <- file.path(tempdir(), "test_out.bw")
  export(gr, test_bw_out) #Note that I had to modify this to use gr since test doesn't exist

I understand that there should be some overhead for indexing, but this seems excessive. Indeed, when I export the same object as .bedGraph, it comes out to 168B. When I convert that file to bigWig using bedGraphToBigWig from the Kent tools, it comes to around 19KB.

Similarly, for the 2,700-range object I have, the bedGraph file is only 108KB, while the bigWig from bedGraphToBigWig is 79KB, not 2GB.

I cannot imagine this is working as intended?

rtracklayer bigwig • 2.3k views

ADD COMMENT • link updated 8.2 years ago by laurent.lacroix ▴ 30 • written 8.3 years ago by koneill ▴ 30

0

Entering edit mode

Ok, I tweaked things, see my answer.

ADD REPLY • link 8.2 years ago Michael Lawrence ★ 11k

0

Entering edit mode

laurent.lacroix ▴ 30

@laurentlacroix-9663

Last seen 6.3 years ago

France/Paris/INSERM

I am coming across a similar problem. I was using rtracklayer to export GRanges to bigwig and used to get a file of 17M for a GRanges like this one

This was back in april 2015. SInce June 2015, I did probably update my R installation to 3.2 and rtracklayer to 1.30 (I don't remember), but now the same bigwig has a 1.4G size...

I tried to create the same file on a cluster running R 3.1 or on an older computer for which I did not update R (still 3.1) and the file were also with a size of 17M. I did update the R version on the old computer to 3.2 and the size come again to 1.4G...

Should I wait for the new rtracklayer release or could I downgrad rtracklayer to 1.26 ?

Thanks in advance

Laurent

ADD COMMENT • link 8.2 years ago laurent.lacroix ▴ 30

0

Entering edit mode

I went ahead and ported a simpler version of the fix to release, will come with version 1.30.2.

ADD REPLY • link 8.2 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thanks. I updated rtracklayer to 1.30.2 but the issue remains the same. The output file is still 1.47Gb vs 17Mb for the same export on R 3.1.3 with rtracklayer 1.26. The procedure take also more time than with the 1.26 version on R 3.1.3. I tried on a smaller GRanges object (17 ranges). The difference is smaller, but the exported file is 283kb in R 3.2 vs 74kb in R 3.1.

ADD REPLY • link 8.2 years ago laurent.lacroix ▴ 30

0

Entering edit mode

Please post a reproducible example. The code from the original poster seems to result in a much smaller footprint in recent versions.

ADD REPLY • link 8.2 years ago Michael Lawrence ★ 11k

0

Entering edit mode

OK, I need to read the FAQ more in detail to learn how to post the larger GRanges file. But in the mean time I messed around my R and I had to reinstall it as well as bioconductor and tada! the exported file has now a size of 14.9Mb! Sorry, the problem did certainly come from my system and some mistake while updating R from 3.1 to 3.2. Sorry again and thanks for the help

ADD REPLY • link 8.2 years ago laurent.lacroix ▴ 30

0

Entering edit mode

OK, I need to read the FAQ more in detail to learn how to post the larger GRanges file. But in the mean time I messed around my R and I had to reinstall it as well as bioconductor and tada! the exported file has now a size of 14.9Mb! Sorry, the problem did certainly come from my system and some mistake while updating R from 3.1 to 3.2. Sorry again and thanks for the help

ADD REPLY • link 8.2 years ago laurent.lacroix ▴ 30

score 2 · Accepted Answer · 2016-01-28

2

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 2.4 years ago

United States

It's probably just generating a huge amount of summary information. Since we typically use bigwigs for big data, the fixed summaries are relatively small. I will see whether dynamically computed summaries are more efficient in this case.

Yes, by tweaking the indexing parameters and computing the summary levels dynamically, we can achieve the same file size as the UCSC tools. That is now the default behavior. There is an argument for computing the old fixed summaries. Will be in version 1.31.6.