Question: summarize scores of GRanges into bins
gravatar for Assa Yeroslaviz
5 months ago by
Assa Yeroslaviz1.3k
Munich, Germany
Assa Yeroslaviz1.3k wrote:

As a third part of my subsetByOverlaps with Multimapping, I would to further manipulate my GRangesList object. I have created a GRangesList objaect after subsetting my intensities to only the regions of the genes in the gtf file.

This gave me a GRangesList object looks like that:

GRangesList object of length 7126:
GRanges object with 1649 ranges and 1 metadata column:
         seqnames       ranges strand |            score
            <Rle>    <IRanges>  <Rle> |        <numeric>
     [1]       MT [6546, 6546]      * | 48.0121515363538
     [2]       MT [6547, 6547]      * | 48.0456043815372
     [3]       MT [6548, 6548]      * | 48.0865514607159
     [4]       MT [6549, 6549]      * | 48.1089269541849
     [5]       MT [6550, 6550]      * | 48.1350452590915
     ...      ...          ...    ... .              ...
  [1645]       MT [8190, 8190]      * | 145.573432997664
  [1646]       MT [8191, 8191]      * | 145.573432997664
  [1647]       MT [8192, 8192]      * | 145.573432997664
  [1648]       MT [8193, 8193]      * | 145.573432997664
  [1649]       MT [8194, 8194]      * | 118.114351592435
<7125 more elements>
seqinfo: 17 sequences from an unspecified genome

The first 10 elements of the list can be found here (it is too big to be added here).

I would like to know, if there a way to summarize the scores into bins.The GRanges in the list are each of different sizes.

>unlist(lapply(gr.list[1:10], length))
15S_rRNA 21S_rRNA     HRA1     ICR1     LSR1     NME1     PWR1    Q0010    Q0017    Q0032 
    1649     4439      564     3199     1175      340      941      387      162      291

I would like to know if it's possible to summarize all elements in each object of the list separately into e.g. 100 bins with the average score value in the metadata column.

The number of bins should be fixed and identical to all objects in the GRangesList, independent of how many ranges were in the original GRanges.  In other word I would like to create for each gene (=GRanges object) in the list a new GRanges object of 100 ranges. The score of each of the ranges would be calculated from the summarized scores in the original GRanges object. 

I would appreciate any help in doing so.

Thanks in advance Assa


ADD COMMENTlink modified 5 months ago by Michael Lawrence9.9k • written 5 months ago by Assa Yeroslaviz1.3k
gravatar for Michael Lawrence
5 months ago by
United States
Michael Lawrence9.9k wrote:

I think you're aiming to generate 100 windows over the span of each gene. First, find the gene spans, then unlist to GRanges, and generate the tiles:

tiles <- tile(unlist(range(gr.list)), n=100)

We now need to figure out which positions overlap which tile. It is simplest to first simplify the data to a GRanges where each position occurs only once.

gr.unique <- unlist(gr.list)[!duplicated(unlist(gr.list)]

Note that using unique() above would drop the score column, so we use duplicated() instead. Now, find the positions overlapping each tile: <- unlist(tiles)
hits <- findOverlaps(, gr.unique)

We have to unlist the tiles since we are interested in overlap per tile. The Hits object implies a grouping on the positions by tile, so we can use it to aggregate the scores:

agg <- aggregate(gr.unique, hits, score=mean(score))

The result contains the aggregated scores, and a column with references back to the positions that were summarized. You could then, for example, attach the scores to the tiles and convert the tiles back to a list by gene:$score <- agg$score
relist(, tiles)



ADD COMMENTlink modified 5 months ago • written 5 months ago by Michael Lawrence9.9k

+1 more. This is amazing, how you can fit anything in R. Thanks a lot for the explanation. 

ADD REPLYlink written 5 months ago by Assa Yeroslaviz1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 149 users visited in the last hour