resize (shrink/expand) GenomicRanges
1
4
Entering edit mode
aush ▴ 40
@aush-9128
Last seen 12 months ago
United States

I have a set of GenomicRanges and all I need is to expand it by certain distance on start and another distance on end (taking into account the strand). For example, from this set:

 strand start end 1 + 100 110 2 - 200 220 3 + 300 330

expanding by 20 nt at start and by 30 nt at the end, and considering chromosome length as 350, I want to get this:

 strand start end 1 + 80 140 2 - 170 240 3 + 280 350

Note that 1) for range #2 shift values are inversed because it's negative strand, 2) for range #3 resulting end position is 350 because it can't be longer than chromosome length.

Here is the code to generate this sample data:

mygr <- GRanges(seqnames  = Rle(rep('chr1', 3)),
ranges    = IRanges(c(100, 200, 300), c(110, 220, 330)),
strand    = c('+','-','+'),
seqlengths= c(chr1=350))

So far I wrote the following function:

grexpand <- function(inputGR, befLen=0, aftLen=0){
inputGR <- resize(inputGR,  width = width(inputGR)+befLen, fix = 'end')
inputGR <- resize(inputGR,  width = width(inputGR)+aftLen, fix = 'start')
return(trim(inputGR));
}

it does the job but I'm wondering if this functionality is already included in the package and I just don't know the right way to call it...

GenomicRanges IRanges • 6.5k views
6
Entering edit mode
@herve-pages-1542
Last seen 22 hours ago
Seattle, WA, United States

Hi,

AFAIK we don't have a function in GenomicRanges to perform this particular transformation. FWIW here is an alternative to your grexpand():

extend <- function(x, upstream=0, downstream=0)
{
if (any(strand(x) == "*"))
warning("'*' ranges were treated as '+'")
on_plus <- strand(x) == "+" | strand(x) == "*"
new_start <- start(x) - ifelse(on_plus, upstream, downstream)
new_end <- end(x) + ifelse(on_plus, downstream, upstream)
ranges(x) <- IRanges(new_start, new_end)
trim(x)
}

It avoids a collision you can get with grexpand() between the start and end of the intermediate GRanges object (i.e. the one produced by your first call to resize()) when befLen is negative:

grexpand(mygr, -12, 12)
# Error in .local(x, width, fix, use.names, ...) :
#   'width' values must be non-negative

extend(mygr, -12, 12)
# GRanges object with 3 ranges and 0 metadata columns:
#       seqnames     ranges strand
#          <Rle>  <IRanges>  <Rle>
#   [1]     chr1 [112, 122]      +
#   [2]     chr1 [188, 208]      -
#   [3]     chr1 [312, 342]      +
#   -------
#   seqinfo: 1 sequence from an unspecified genome

More generally speaking, when downstream == -upstream, the transformation should be equivalent to a strand aware shift() and so should always work.

Cheers,

H.

0
Entering edit mode

I want to do the same thing, like to include promoter regions (2kb upstream) to my genes. I wonder what is the reasons for not having this feature in GenomicRanges package until now? I was also trying to use resize function for this purpose. Luckily I found this thread.

0
Entering edit mode

Do note the flank() and promoters() functions. This case is very specific to wanting a different expansion on either side of the range. Most of the complication comes down not having a strand-specific shift(). It would also be simpler if we did not have to worry about a negative upstream length. A typical use case of combining the promoter with the gene range would just be punion(promoters(mygr), mygr). One could use resize() there to expand on the other side. I could see a good argument for extend() being added for these special cases. It would be generalized to ordinary Ranges objects and use "start" and "end" as argument names instead of "upstream" and "downstream". Then, I don't think there's a reason to keep the warning about how "*" strand ranges are treated.