GenomicRanges equivalent to bedtools cluster
1
0
Entering edit mode
kalamari • 0
@kalamari-22562
Last seen 3.3 years ago

Given bed genomic coordinates:

cat A.bed
chr1  100  200
chr1  180  250
chr1  250  500
chr1  501  1000

bedtools cluster adds a 'category' for each element

bedtools cluster -i A.bed

chr1  100     200     1
chr1  180     250     1
chr1  250     500     1
chr1  501     1000    2

with the additional option to control how close two features must be in order to cluster (adding the parameter -d 1000)

How can you achieve this behaviour with genomic ranges?

Thank you.

IRanges GenomicRanges • 1.2k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 53 minutes ago
United States

You can use findOverlaps for that, along with reduce to do the clustering. The min.gapwidth argument can be used to emulate the -d argument to bedtools cluster.

> gr <- GRanges(rep("chr1", 4), IRanges(c(100,180,250,501), c(200, 250, 500, 1000)))
> gr
GRanges object with 4 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1   100-200      *
  [2]     chr1   180-250      *
  [3]     chr1   250-500      *
  [4]     chr1  501-1000      *

> gr$cluster <- subjectHits(findOverlaps(gr, reduce(gr, min.gapwidth = 0L)))
> gr
GRanges object with 4 ranges and 1 metadata column:
      seqnames    ranges strand |   cluster
         <Rle> <IRanges>  <Rle> | <integer>
  [1]     chr1   100-200      * |         1
  [2]     chr1   180-250      * |         1
  [3]     chr1   250-500      * |         1
  [4]     chr1  501-1000      * |         2
  -------
ADD COMMENT

Login before adding your answer.

Traffic: 765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6