Change the interval of bed file
1
0
Entering edit mode
the • 0
@49a1a401
Last seen 3 months ago
Turkey

Hi,

I have a bed file containing mappability values of reference genome. It starts from 10,000th position and goes on. The interval for each line varies, but there are no jumps and overlaps. For example, the first line is between 10,000 and 10,0039, the second line is 10,039 and 10,040. I want to change the interval to 1000 base pairs for each line and take the mean of mappability value for that interval.

Let's say: Input chr1 10,000 10,039 0.4 ... 10,039 ..... 0.2

Output chr1 10,000 11,000 0.2 (mean mappability value between 10,000 and 11,000 position)

mappability bed python • 148 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 22 minutes ago
United States

How about

> library(GenomicRanges)
## reproducibility
> set.seed(0xabeef)
## fake GRranges with scores
> gr <- GRanges(rep("chr1", 10), IRanges(c(1, 39, 450, 1081, 1195, 2021, 2378, 4500, 7600, 8989) + 1e4, width = rpois(10, 20)), score = runif(10))
## new 1000bp window GRanges
> ogr <- GRanges(rep("chr1", 10), IRanges(seq(1e4, 1.9e4, 1e3), width = 1000))
## get overlaps and compute mean score
>  fo <- findOverlaps(gr, ogr)
>  lst <- split(queryHits(fo), subjectHits(fo))
>  lstvals <- lapply(lst, function(x) mean(score(gr)[x]))
## put mean scores in new GRanges
> ogr$scores[as.numeric(names(lstvals))] <- unlist(lstvals)
## et voila
> ogr
GRanges object with 10 ranges and 1 metadata column:
       seqnames      ranges strand |    scores
          <Rle>   <IRanges>  <Rle> | <numeric>
   [1]     chr1 10000-10999      * | 0.6582870
   [2]     chr1 11000-11999      * | 0.2332371
   [3]     chr1 12000-12999      * | 0.9091612
   [4]     chr1 13000-13999      * |        NA
   [5]     chr1 14000-14999      * | 0.9697119
   [6]     chr1 15000-15999      * |        NA
   [7]     chr1 16000-16999      * |        NA
   [8]     chr1 17000-17999      * | 0.0694638
   [9]     chr1 18000-18999      * | 0.4340691
  [10]     chr1 19000-19999      * | 0.4340691
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Login before adding your answer.

Traffic: 545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6