Add columns to granges object that sum score around range
1
0
Entering edit mode
ssabri ▴ 20
@ssabri-9464
Last seen 4.0 years ago

I have a wig file that I've read into a granges-like object using a function that I wrote that calls the `rtracklayer` package:

    read_wig <- function(x, format='wig', genome='mm9') {
      suppressMessages(library(rtracklayer))  
      merged_wig <- import.wig(x, format=format, genome=genome)
      merged_wig <- keepSeqlevels(merged_wig, paste0('chr', c(seq(1,19), 'X', 'Y')), pruning.mode="coarse")
      return(merged_wig)
    }

    wig <- read_wig('~/path/to/wig')

The code above returns: 

    > wig
    UCSC track 'MEFES_K27AC.downsampled.sorted'
    UCSCData object with 13274466 ranges and 1 metadata column:
                 seqnames               ranges strand |     score
                    <Rle>            <IRanges>  <Rle> | <numeric>
             [1]     chr1          [  1,  200]      * |         0
             [2]     chr1          [201,  400]      * |         0
             [3]     chr1          [401,  600]      * |         0
             [4]     chr1          [601,  800]      * |         0
             [5]     chr1          [801, 1000]      * |         0
             ...      ...                  ...    ... .       ...
      [13274462]     chrY [15901401, 15901600]      * |         0
      [13274463]     chrY [15901601, 15901800]      * |         0
      [13274464]     chrY [15901801, 15902000]      * |         0
      [13274465]     chrY [15902001, 15902200]      * |         0
      [13274466]     chrY [15902201, 15902400]      * |         0
      -------
      seqinfo: 21 sequences from mm9 genome

Now with this object **I'd like to compute the sum of scores within a window around each range for each row in the object**. For example, I'd like to compute the sum of score between ranges 1-10000 (123 for this example) and add this entry as a column next to score. I'd like to do this for each row. 

   > expected_output
    UCSC track 'MEFES_K27AC.downsampled.sorted'
    UCSCData object with 13274466 ranges and 1 metadata column:
                 seqnames               ranges strand |     score  score_10000
                    <Rle>            <IRanges>  <Rle> | <numeric>    <numeric>
             [1]     chr1          [  1,  200]      * |         0          123
             [2]     chr1          [201,  400]      * |         0          ...
             [3]     chr1          [401,  600]      * |         0          ...
             [4]     chr1          [601,  800]      * |         0          ...
             [5]     chr1          [801, 1000]      * |         0          ...
             ...      ...                  ...    ... .       ...
      [13274462]     chrY [15901401, 15901600]      * |         0          ...
      [13274463]     chrY [15901601, 15901800]      * |         0          ...
      [13274464]     chrY [15901801, 15902000]      * |         0          ...
      [13274465]     chrY [15902001, 15902200]      * |         0          ...
      [13274466]     chrY [15902201, 15902400]      * |         0          ...
      -------
      seqinfo: 21 sequences from mm9 genome

Ideally I'd like to add columns that compute score ranges from 1-10000, 1-20000, 1-30000, etc. up to 100000.

Any help would be much appreciated! 

 

EDIT: I've added a wig file here that can be used to run the code above.  

R Grang r granges rtracklayer • 2.0k views
ADD COMMENT
0
Entering edit mode

Needs some clarification. What sort of window do you want around each range? 10kb, centered on the midpoint of the range, or what? Btw, keepStandardChromosomes() might be helpful.

ADD REPLY
0
Entering edit mode

Hi and thank you for your reply. I would like a window to begin at the start of the range and go 10000 upstream. So for the first row the range is 1-200 and I want to compute the sum of scores between 1-10000 and add this number to the metadata columns. The second row range is 201-400 so I want to compute the sum of scores between 201-10201. Ideally I'd like to add multiple columns that sum the scores for each range from 1-10000, 1-20000, 1-30000, up to 100000. Does that help out? 

ADD REPLY
0
Entering edit mode

Hi Michael -- I'm thinking more about this and I think it might be best to center on the midpoint of the range for each row and then compute a sum forward and backwards of the midpoint. For example, for a sum of scores within a 10000 window, we look 5000 upstream and 5000 downstream. Does that make sense? Any help would be greatly appreciated!

I've just added an example wig file that can be used to run the code in the original post. 

ADD REPLY
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.3 years ago
United States

One way to do this fast is to take advantage of the fixed width WIG data (200bp). Then it is easy to compute how many neighboring elements to include in the sum.

window <- 10000L
width <- 200L
w <- resize(IRanges(seq_along(wig), width=1L), window/width, fix="center")
w <- restrict(w, 1L, rep(end(seqnames(wig)), width(seqnames(wig))))
ans <- sum(extractList(wig$score, w))

Note that the call to restrict() is going to truncate the windows at the edges. The important thing to realize is that an IRanges object is an IntegerList. Let me know if anything doesn't make sense.

 

ADD COMMENT
0
Entering edit mode

Excellent. I think this is working properly. I have set up my code like so: 

rolling_sum <- function(x, window=5) {
  w <- resize(IRanges(seq_along(x), width=1L), window/unique(width(x)), fix="center")
  w <- restrict(w, 1L, rep(end(seqnames(x)), width(seqnames(x))))
  sum(extractList(x$score, w))
}

mcols(tmp)$score_10000 <- rolling_sum(tmp, window=10000)
mcols(tmp)$score_20000 <- rolling_sum(tmp, window=20000)
mcols(tmp)$score_30000 <- rolling_sum(tmp, window=30000)
mcols(tmp)$score_40000 <- rolling_sum(tmp, window=40000)
mcols(tmp)$score_50000 <- rolling_sum(tmp, window=50000)
mcols(tmp)$score_60000 <- rolling_sum(tmp, window=60000)
mcols(tmp)$score_70000 <- rolling_sum(tmp, window=70000)
mcols(tmp)$score_80000 <- rolling_sum(tmp, window=80000)
mcols(tmp)$score_90000 <- rolling_sum(tmp, window=90000)
mcols(tmp)$score_100000 <- rolling_sum(tmp, window=100000)

 

ADD REPLY
0
Entering edit mode

Might want to consider a loop.

ADD REPLY

Login before adding your answer.

Traffic: 731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6