Question: Split Genomic Ranges With Reference Overlaps (Centromere)
0
gravatar for gaiusjaugustus
2.5 years ago by
University of Arizona
gaiusjaugustus0 wrote:

Hi, I have list of ranges (that I can convert to a genomic ranges object), some of which pass over the centromere of a chromosome.  I also have a file that denotes the start and end of each chromosome arm (leaving out the centromere).

For example:

Chromosome   Start     End     Arm
1 1 120000000 p
1 145000000 249198692 q

I'm trying to split these ranges such that any range that overlaps the centromere is split:

  • If a range completely overlaps centromere, split into two ranges and remove just region in centromere
  • If range is within centromere, remove it
  • If range partially overlaps centromere, remove the region in centromere

 

I'm working on writing an R script, but figured there might be something within GenomicRanges that already does this.  My main concern is that I have many metadata columns that I'd like to keep if possible.  If not, I can work with it.

 

genomicranges granges split • 561 views
ADD COMMENTlink modified 2.5 years ago by koisingh690 • written 2.5 years ago by gaiusjaugustus0
Answer: Split Genomic Ranges With Reference Overlaps (Centromere)
2
gravatar for Michael Lawrence
2.5 years ago by
United States
Michael Lawrence11k wrote:

Find the centromere for each range, and subtract it. The result is a GRangesList, because you need to group the result ranges by each input range. Metadata is carried over.

centromeres <- centromeres[match(seqnames(ranges), seqnames(centromeres))]
centromeres <- as(centromeres, "GRangesList")
ans <- psetdiff(ranges, centromeres)
mcols(ans) <- mcols(ranges)

The most subtle bit is the coercion to GRangesList. That's what tells psetdiff() to split the ranges and return a GRangesList (as an endomorphism), instead of throwing an error. Ranges that fall completely within a centromere will become zero-width ranges, which you can filter out.

ans <- ans[width(ans) > 0L]

 

ADD COMMENTlink written 2.5 years ago by Michael Lawrence11k
Answer: Split Genomic Ranges With Reference Overlaps (Centromere)
0
gravatar for Hervé Pagès
2.5 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi,

Here is a solution using restrict(). Note that it does not preserve the order of the original ranges in general. However it preserves zero-width ranges that are not within the centromere:

x <- IRanges(start=c(6:10, 10, 14, 10, 8:10),
             width=rep(c(3, 2, 7, 0), c(5, 1, 3, 2)))
mcols(x)$label <- letters[1:11]
x
# IRanges object with 11 ranges and 1 metadata column:
#            start       end     width |       label
#        <integer> <integer> <integer> | <character>
#    [1]         6         8         3 |           a
#    [2]         7         9         3 |           b
#    [3]         8        10         3 |           c
#    [4]         9        11         3 |           d
#    [5]        10        12         3 |           e
#    [6]        10        11         2 |           f
#    [7]        14        20         7 |           g
#    [8]        10        16         7 |           h
#    [9]         8        14         7 |           i
#   [10]         9         8         0 |           j
#   [11]        10         9         0 |           k

Let's say the centromere spans positions 9 to 11:

p_arm_end <- 8
q_arm_start <- 12

Then:

## Restrict ranges to the p arm:
y1 <- restrict(x, end=p_arm_end, keep.all.ranges=TRUE)
y1 <- y1[width(y1) != 0 | y1 == x]
## Restrict ranges to the q arm:
y2 <- restrict(x, start=q_arm_start, keep.all.ranges=TRUE)
y2 <- y2[width(y2) != 0 | y2 == x]
## Combine the ranges:
y <- c(y1, y2)
y
# IRanges object with 9 ranges and 1 metadata column:
#           start       end     width |       label
#       <integer> <integer> <integer> | <character>
#   [1]         6         8         3 |           a
#   [2]         7         8         2 |           b
#   [3]         8         8         1 |           c
#   [4]         8         8         1 |           i
#   [5]         9         8         0 |           j
#   [6]        12        12         1 |           e
#   [7]        14        20         7 |           g
#   [8]        12        16         5 |           h
#   [9]        12        14         3 |           i

Range i was split into 2 ranges. Ranges d, f, k were removed.

H.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Hervé Pagès ♦♦ 14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 220 users visited in the last hour