Question

default behaviour of import.bw

0

Entering edit mode

Yixin • 0

@33a90a56

Last seen 12 months ago

United States

Hi,

When I'm using the import.bw from rtracklayer, if two adjacent positions have the same scores, they will be put into a single GRanges, i.e., position 1 and 2 in this case

> gr1
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr2       1-2      * |         1
  [2]     chr2         4      * |         1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

But the GRanges I wish to get back is like this one,

> gr0
GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr2         1      * |         1
  [2]     chr2         2      * |         1
  [3]     chr2         4      * |         1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

The reason is I want to sum the scores across all sites, base by base, while

> sum(gr0$score)
[1] 3
> sum(gr1$score)
[1] 2

I tried to find parameters in the function to control this behaviour but get no luck. Maybe I miss something.

I do know a function makeGRangesBRG from BRGenomics can help me with that but I'm wondering if there is a more native and direct way?

Thanks,

Yixin

import.bw rtracklayer • 1.1k views

ADD COMMENT • link 2.5 years ago Yixin • 0

score 2 · Accepted Answer · 2021-11-08

Can you clarify? I don't see that behavior. Here is a reproducible example (which ideally you would have supplied).

## Get a Wig file to test

> test_path <- system.file("tests", package = "rtracklayer")
>  test_wig <- file.path(test_path, "step.wig")
> z <- import(test_wig)
> z
UCSC track 'test'
UCSCData object with 19 ranges and 1 metadata column:
       seqnames            ranges strand |     score
          <Rle>         <IRanges>  <Rle> | <numeric>
   [1]    chr19          59104701      * |      10.0
   [2]    chr19          59104901      * |      12.5
   [3]    chr19          59105401      * |      15.0
   [4]    chr19          59105601      * |      17.5
   [5]    chr19          59105901      * |      20.0
   ...      ...               ...    ... .       ...
  [15]    chr18 59109521-59109720      * |       500
  [16]    chr18 59109821-59110020      * |       400
  [17]    chr18 59110121-59110320      * |       300
  [18]    chr18 59110421-59110620      * |       200
  [19]    chr18 59110721-59110920      * |       100
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

## Add adjacent position with the same score

> zz <- c(GRanges("chr19", IRanges(59104902, width = 1), strand = "*", score = 10.0), z)

## And sort

> zz <- sort(zz)
> zz
GRanges object with 20 ranges and 1 metadata column:
       seqnames            ranges strand |     score
          <Rle>         <IRanges>  <Rle> | <numeric>
   [1]    chr19          59104701      * |      10.0     <------------- This and the next are the same score and adjacent
   [2]    chr19          59104702      * |      10.0
   [3]    chr19          59104901      * |      12.5
   [4]    chr19          59105401      * |      15.0
   [5]    chr19          59105601      * |      17.5
   ...      ...               ...    ... .       ...
  [16]    chr18 59109521-59109720      * |       500
  [17]    chr18 59109821-59110020      * |       400
  [18]    chr18 59110121-59110320      * |       300
  [19]    chr18 59110421-59110620      * |       200
  [20]    chr18 59110721-59110920      * |       100
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

## export and convert to BigWig

> export.wig(zz, "test.wig")

## need seqinfo

> si <- seqinfo(GRangesForUCSCGenome("hg19", paste0("chr", 18:19)))
> wigToBigWig("test.wig", si, "test.bw")

## re-import and check

> bw <- import("test.bw")
> as(bw, "data.frame")
   seqnames    start      end width strand  score
1     chr18 59108021 59108220   200      * 1000.0
2     chr18 59108321 59108520   200      *  900.0
3     chr18 59108621 59108820   200      *  800.0
4     chr18 59108921 59109120   200      *  700.0
5     chr18 59109221 59109420   200      *  600.0
6     chr18 59109521 59109720   200      *  500.0
7     chr18 59109821 59110020   200      *  400.0
8     chr18 59110121 59110320   200      *  300.0
9     chr18 59110421 59110620   200      *  200.0
10    chr18 59110721 59110920   200      *  100.0
11    chr19 59104701 59104701     1      *   10.0  <----------------- Still adjacent and same score
12    chr19 59104702 59104702     1      *   10.0
13    chr19 59104901 59104901     1      *   12.5
14    chr19 59105401 59105401     1      *   15.0
15    chr19 59105601 59105601     1      *   17.5
16    chr19 59105901 59105901     1      *   20.0
17    chr19 59106081 59106081     1      *   17.5
18    chr19 59106301 59106301     1      *   15.0
19    chr19 59106691 59106691     1      *   12.5
20    chr19 59107871 59107871     1      *   10.0
>