tileGenome for overlapping ranges?
4
3
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 5.2 years ago
Fred Hutchinson Cancer Research Center,…

Hi there,

 

A little late after the fact, I just noticed the "new" tileGenome function in GenomicRange - very nice. Thank you!  I'll start using this now instead of a much slower function I'd written myself. 

I sometimes find myself looking at overlapping sliding windows with my slow function, rather than the non-overlapping tiles that tileGenome produces. Would it be possible to add that option to the function?  I'd like to specify window size and slide amount. In a slightly ridiculous toy example, I might want 100bp windows with a slide of 20bp on a 150bp chromosome, so the windows would be at these positions:

1-100

21-120

41-140

61-150

What do you think - would this be easy for you guys to do?

thanks very much,

Janet

 

genomicranges • 3.5k views
ADD COMMENT
1
Entering edit mode
@vincent-j-carey-jr-4
Last seen 4 days ago
United States

Perhaps you can get what you want with available operations.

 

>    seqlengths <- c(chr1=150, chr2=500)

> tt = tileGenome(seqlengths, tilewidth=20)

> utt = unlist(tt)

> trim(resize(utt, width=100))

GRanges object with 34 ranges and 0 metadata columns:

       seqnames     ranges strand

          <Rle>  <IRanges>  <Rle>

   [1]     chr1  [ 1, 100]      *

   [2]     chr1  [21, 120]      *

   [3]     chr1  [41, 140]      *

   [4]     chr1  [61, 150]      *

   [5]     chr1  [80, 150]      *

   ...      ...        ...    ...

  [30]     chr2 [403, 500]      *

  [31]     chr2 [423, 500]      *

  [32]     chr2 [442, 500]      *

  [33]     chr2 [462, 500]      *

  [34]     chr2 [482, 500]      *

  -------

  seqinfo: 2 sequences from an unspecified genome
0
Entering edit mode

For a simple GRanges like that from

tt = tileGenome(seqlengths, tilewidth=20, cut.last.tile.in.chrom=TRUE)

I guess 

end(tt) = pmin(start(tt) + 100, seqlengths(tt)[as.character(seqnames(tt))])

keeps the ends within chromosome bounds (avoiding the warning message from resize()) and for the more complicated GRangesList the 'unlist / relist` trick does the same

tt0 = tileGenome(seqlengths, tilewidth=20)
tt = unlist(tt0, use.names=FALSE)
end(tt) = pmin(start(tt) + 100, seqlengths(tt)[as.character(seqnames(tt))])
relist(tt, tt0)

 

ADD REPLY
1
Entering edit mode

Hi,

I think there would definitely be some value in enhancing tileGenome() to allow overlap or spacing between the tiles. This could be achieved via a spacing arg that would be 0 by default. When spacing is positive, say 2, and with tilewidth=20 one would get the following ranges:

 1-20
23-42
45-64
etc...

When spacing is negative, say -2, one would get overlapping ranges:

 1-20
19-38
37-56
etc...

So to get the tiles she wants, Janet would need to specify tilewidth=100 and spacing=-80.

Even though I anticipate that most of the time people will use a negative spacing, I prefer this to having the extra argument be called overlap that would be interpreted as the opposite of spacing (i.e. overlap=N means spacing=-N). Does this sound reasonable?

Thanks,

H. 

 

ADD REPLY
0
Entering edit mode
+1 On Sat, Feb 7, 2015 at 8:37 PM, Hervé Pagès [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Hervé Pagès <https: support.bioconductor.org="" u="" 1542=""/> wrote Comment: > tileGenome for overlapping ranges? > <https: support.bioconductor.org="" p="" 64708="" #64715="">: > > Hi, > > I think there would definitely be some value in enhancing tileGenome() to > allow overlap or spacing between the tiles. This could be achieved via a > spacing arg that would be 0 by default. When spacing is positive, say 2, > and with tilewidth=20 one would get the following ranges: > > 1-20 > 23-42 > 45-64 > etc... > > When spacing is negative, say -2, one would get overlapping ranges: > > 1-20 > 19-38 > 37-56 > etc... > > So to get the tiles she wants, Janet would need to specify tilewidth=100 > and spacing=-80. > > Even though I anticipate that most of the time people will use a negative > spacing, I prefer this to having the extra argument be called overlap that > would interpreted as the opposite of spacing (i.e. overlap=N means > spacing=-N). Does this sound reasonable? > > Thanks, > > H. > > > > ------------------------------ > > You may reply via email or visit > C: tileGenome for overlapping ranges? >
ADD REPLY
0
Entering edit mode

I think I'd actually prefer the original suggestion of options for tilewidth and step, where step defaults to tilewidth. (Mathematically, step = spacing + tilewidth.) Or maybe support providing either spacing or step, in a similar manner to how Ranges support any two of start, end, and width? This is based on my experience that usually I say something like "I want windows of width X tiled every Y bp across the genome" and not "I want windows of width X with an overlap of Z tiled across the genome".

ADD REPLY
0
Entering edit mode

Hi Ryan,

Interesting.

Given that tileGenome() allows the user to specify the number of tiles (ntile arg) instead of the tile width (tilewidth arg), s/he might also want to say "I want N windows with an overlap of Z tiled across the genome". In that case it can be hard for him/her to figure out what step to use. And vice-versa: if someone wants to say "I want N windows tiled every Y bp across the genome" it can be hard to express this in terms of spacing.

So I think we should probably have both spacing and step, as 2 exclusive args.

Thanks for your feedback,

H.

ADD REPLY
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 5.2 years ago
Fred Hutchinson Cancer Research Center,…

Thanks very much, all - it's nice to have a way to do it with existing code, and also great to see that it could be a built-in option for the function at some point soon.  The built-in option will be much more intuitive for us naive biologists than the clever resizing method, which isn't immediately obvious.

Herve: yes, that idea does sound very reasonable.  Perhaps it'd help people search for and intuitively understand the new option if the help page includes the phrases "sliding window" and "overlap", even if the option is called spacing - I think those might be the more commonly used names in publications, etc. 

thanks again,

Janet

 

ADD COMMENT
0
Entering edit mode

OK good to know. I've put this on my TODO list and will post again here when it's ready. Thanks for your feedback.

H.

ADD REPLY
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 5.2 years ago
Fred Hutchinson Cancer Research Center,…

Thanks, Herve - that's great!

Janet

ADD COMMENT
0
Entering edit mode
@biocyberman-8632
Last seen 9.4 years ago
Denmark

Just want to continue to illustrate the approach @Martin showed. The 6th range, which is at the end of the following snippet, may not be desirable. It is completely within the second last range.
 

> seqlengths <- c(chr1=60, chr2=20, chr3=25)
> tt <- tileGenome(seqlengths, tilewidth = 20, cut.last.tile.in.chrom = TRUE)
> tt
GRanges object with 6 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [ 1, 20]      *
  [2]     chr1  [21, 40]      *
  [3]     chr1  [41, 60]      *
  [4]     chr2  [ 1, 20]      *
  [5]     chr3  [ 1, 20]      *
  [6]     chr3  [21, 25]      *
  -------
  seqinfo: 3 sequences from an unspecified genome
>seqlengths(tt)[as.character(seqnames(tt))]
chr1 chr1 chr1 chr2 chr3 chr3
  60   60   60   20   25   25
> seqnames(tt)
factor-Rle of length 6 with 3 runs
  Lengths:    3    1    2
  Values : chr1 chr2 chr3
Levels(3): chr1 chr2 chr3
> as.character(seqnames(tt))
[1] "chr1" "chr1" "chr1" "chr2" "chr3" "chr3"
>  
>
> end(tt) <- pmin(start(tt) + 30, seqlengths(tt)[as.character(seqnames(tt))])
> tt
GRanges object with 6 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [ 1, 31]      *
  [2]     chr1  [21, 51]      *
  [3]     chr1  [41, 60]      *
  [4]     chr2  [ 1, 20]      *
  [5]     chr3  [ 1, 25]      *
  [6]     chr3  [21, 25]      * # Should this range be here??!
  -------
  seqinfo: 3 sequences from an unspecified genome
>
 
ADD COMMENT

Login before adding your answer.

Traffic: 624 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6