Search
Question: export.bw produces enormous files (10^5 times bigger than they should be)
1
gravatar for koneill
2.4 years ago by
koneill30
Canada
koneill30 wrote:

Hi

I was trying to write out some tracks of averaged methylation data (represented as GenomicRanges objects) as bigWig files using export.bw, but for my simple 1Mbp window data (2,700 ranges), it produced a 2GB file!

Testing with the not-run example in the documentation, this holds as well. The example object, with only 9 ranges, produces a 268KB bigWig file:

  test_path <- system.file("tests", package = "rtracklayer")
  test_bw <- file.path(test_path, "test.bw")

  ## GRanges
  ## Returns ranges with non-zero scores.
  gr <- import(test_bw)
  gr

  which <- GRanges(c("chr2", "chr2"), IRanges(c(1, 300), c(400, 1000)))
  import(test_bw, which = which)

  ## RleList
  ## Scores returned as an RleList is equivalent to the coverage.
  ## Best option when 'which' or 'selection' contain many small ranges.
  mini <- narrow(unlist(tile(which, 50)), 2)
  rle <- import(test_bw, which = mini, as = "RleList")
  rle

  ## NumericList
  ## The 'which' is stored as metadata:
  track <- import(test_bw, which = which, as = "NumericList")
  metadata(track)

## Not run:
  test_bw_out <- file.path(tempdir(), "test_out.bw")
  export(gr, test_bw_out) #Note that I had to modify this to use gr since test doesn't exist

 

I understand that there should be some overhead for indexing, but this seems excessive. Indeed, when I export the same object as .bedGraph, it comes out to 168B. When I convert that file to bigWig using bedGraphToBigWig from the Kent tools, it comes to around 19KB.

 

Similarly, for the 2,700-range object I have, the bedGraph file is only 108KB, while the bigWig from bedGraphToBigWig is 79KB, not 2GB.

 

I cannot imagine this is working as intended?

ADD COMMENTlink modified 2.4 years ago by laurent.lacroix30 • written 2.4 years ago by koneill30

Ok, I tweaked things, see my answer.

ADD REPLYlink written 2.4 years ago by Michael Lawrence10.0k
2
gravatar for Michael Lawrence
2.4 years ago by
Michael Lawrence10.0k
United States
Michael Lawrence10.0k wrote:

It's probably just generating a huge amount of summary information. Since we typically use bigwigs for big data, the fixed summaries are relatively small. I will see whether dynamically computed summaries are more efficient in this case.

Yes, by tweaking the indexing parameters and computing the summary levels dynamically, we can achieve the same file size as the UCSC tools. That is now the default behavior. There is an argument for computing the old fixed summaries. Will be in version 1.31.6.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Michael Lawrence10.0k

Thanks, Michael!

ADD REPLYlink written 2.4 years ago by koneill30
0
gravatar for laurent.lacroix
2.4 years ago by
France/Paris/INSERM
laurent.lacroix30 wrote:

I am coming across a similar problem. I was using rtracklayer to export GRanges to bigwig and used to get a file of 17M for a GRanges like this one

            seqnames         ranges strand   |     score
               <Rle>      <IRanges>  <Rle>   | <integer>
        [1]     chr1   [   1, 1000]      *   |         0
        [2]     chr1   [1001, 2000]      *   |         0
        [3]     chr1   [2001, 3000]      *   |         0
        [4]     chr1   [3001, 4000]      *   |         0
        [5]     chr1   [4001, 5000]      *   |         0
        ...      ...            ...    ... ...       ...
  [3095702]     chrM [12001, 13000]      *   |         1
  [3095703]     chrM [13001, 14000]      *   |         2
  [3095704]     chrM [14001, 15000]      *   |         2
  [3095705]     chrM [15001, 16000]      *   |         1
  [3095706]     chrM [16001, 16571]      *   |         1
  -------
  seqinfo: 25 sequences (1 circular) from hg19 genome

This was back in april 2015. SInce June 2015, I did probably update my R installation to 3.2 and rtracklayer to 1.30 (I don't remember), but now the same bigwig has a 1.4G size...

I tried to create the same file on a cluster running R 3.1 or on an older computer for which I did not update R (still 3.1) and the file were also with a size of 17M. I did update the R version on the old computer to 3.2 and the size come again to 1.4G...

Should I wait for the new rtracklayer release or could I downgrad rtracklayer to 1.26 ?

Thanks in advance

Laurent

 

 

 

 

ADD COMMENTlink written 2.4 years ago by laurent.lacroix30

I went ahead and ported a simpler version of the fix to release, will come with version 1.30.2.

ADD REPLYlink written 2.4 years ago by Michael Lawrence10.0k

Thanks. I updated rtracklayer to 1.30.2 but the issue remains the same. The output file is still 1.47Gb vs 17Mb for the same export on R 3.1.3 with rtracklayer 1.26. The procedure take also more time than with the 1.26 version on R 3.1.3. I tried on a smaller GRanges object (17 ranges). The difference is smaller, but the exported file is 283kb in R 3.2 vs 74kb in R 3.1.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by laurent.lacroix30

Please post a reproducible example. The code from the original poster seems to result in a much smaller footprint in recent versions.

ADD REPLYlink written 2.3 years ago by Michael Lawrence10.0k

OK, I need to read the FAQ more in detail to learn how to post the larger GRanges file. But in the mean time I messed around my R and I had to reinstall it as well as bioconductor and tada! the exported file has now a size of 14.9Mb! Sorry, the problem did certainly come from my system and some mistake while updating R from 3.1 to 3.2. Sorry again and thanks for the help

 

ADD REPLYlink written 2.3 years ago by laurent.lacroix30

OK, I need to read the FAQ more in detail to learn how to post the larger GRanges file. But in the mean time I messed around my R and I had to reinstall it as well as bioconductor and tada! the exported file has now a size of 14.9Mb! Sorry, the problem did certainly come from my system and some mistake while updating R from 3.1 to 3.2. Sorry again and thanks for the help

 

ADD REPLYlink written 2.3 years ago by laurent.lacroix30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 325 users visited in the last hour