Split Genomic Ranges With Reference Overlaps (Centromere)
3
0
Entering edit mode
@gaiusjaugustus-10041
Last seen 6.1 years ago
University of Arizona

Hi, I have list of ranges (that I can convert to a genomic ranges object), some of which pass over the centromere of a chromosome.  I also have a file that denotes the start and end of each chromosome arm (leaving out the centromere).

For example:

Chromosome   Start     End     Arm
1 1 120000000 p
1 145000000 249198692 q

I'm trying to split these ranges such that any range that overlaps the centromere is split:

  • If a range completely overlaps centromere, split into two ranges and remove just region in centromere
  • If range is within centromere, remove it
  • If range partially overlaps centromere, remove the region in centromere

 

I'm working on writing an R script, but figured there might be something within GenomicRanges that already does this.  My main concern is that I have many metadata columns that I'd like to keep if possible.  If not, I can work with it.

 

granges genomicranges split • 1.4k views
ADD COMMENT
2
Entering edit mode
@michael-lawrence-3846
Last seen 2.9 years ago
United States

Find the centromere for each range, and subtract it. The result is a GRangesList, because you need to group the result ranges by each input range. Metadata is carried over.

centromeres <- centromeres[match(seqnames(ranges), seqnames(centromeres))]
centromeres <- as(centromeres, "GRangesList")
ans <- psetdiff(ranges, centromeres)
mcols(ans) <- mcols(ranges)

The most subtle bit is the coercion to GRangesList. That's what tells psetdiff() to split the ranges and return a GRangesList (as an endomorphism), instead of throwing an error. Ranges that fall completely within a centromere will become zero-width ranges, which you can filter out.

ans <- ans[width(ans) > 0L]

 

ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 12 hours ago
Seattle, WA, United States

Hi,

Here is a solution using restrict(). Note that it does not preserve the order of the original ranges in general. However it preserves zero-width ranges that are not within the centromere:

x <- IRanges(start=c(6:10, 10, 14, 10, 8:10),
             width=rep(c(3, 2, 7, 0), c(5, 1, 3, 2)))
mcols(x)$label <- letters[1:11]
x
# IRanges object with 11 ranges and 1 metadata column:
#            start       end     width |       label
#        <integer> <integer> <integer> | <character>
#    [1]         6         8         3 |           a
#    [2]         7         9         3 |           b
#    [3]         8        10         3 |           c
#    [4]         9        11         3 |           d
#    [5]        10        12         3 |           e
#    [6]        10        11         2 |           f
#    [7]        14        20         7 |           g
#    [8]        10        16         7 |           h
#    [9]         8        14         7 |           i
#   [10]         9         8         0 |           j
#   [11]        10         9         0 |           k

Let's say the centromere spans positions 9 to 11:

p_arm_end <- 8
q_arm_start <- 12

Then:

## Restrict ranges to the p arm:
y1 <- restrict(x, end=p_arm_end, keep.all.ranges=TRUE)
y1 <- y1[width(y1) != 0 | y1 == x]
## Restrict ranges to the q arm:
y2 <- restrict(x, start=q_arm_start, keep.all.ranges=TRUE)
y2 <- y2[width(y2) != 0 | y2 == x]
## Combine the ranges:
y <- c(y1, y2)
y
# IRanges object with 9 ranges and 1 metadata column:
#           start       end     width |       label
#       <integer> <integer> <integer> | <character>
#   [1]         6         8         3 |           a
#   [2]         7         8         2 |           b
#   [3]         8         8         1 |           c
#   [4]         8         8         1 |           i
#   [5]         9         8         0 |           j
#   [6]        12        12         1 |           e
#   [7]        14        20         7 |           g
#   [8]        12        16         5 |           h
#   [9]        12        14         3 |           i

Range i was split into 2 ranges. Ranges d, f, k were removed.

H.

ADD COMMENT

Login before adding your answer.

Traffic: 428 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6