finding neigbouring intervals for 2 GRange objects

0

Entering edit mode

Hermann Norpois ▴ 170

@hermann-norpois-5726

Last seen 9.0 years ago

Germany

Hello, I have to grange-objects - gr1 and gr2. And I wish to know: 1) What are the neigbouring intervals of gr1 in gr2 - upstream and downstream? 2) What is the distance in bp between the neighbouring intervals? Could anybody give me a hint how this is done? Thanks Hermann > dput (gr1) new("GRanges" , seqnames = new("Rle" , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY"), class = "factor") , lengths = 4L , elementMetadata = NULL , metadata = list() ) , ranges = new("IRanges" , start = c(540823L, 752809L, 771015L, 773361L) , width = c(221L, 230L, 520L, 247L) , NAMES = NULL , elementType = "integer" , elementMetadata = NULL , metadata = list() ) , strand = new("Rle" , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") , lengths = 4L , elementMetadata = NULL , metadata = list() ) , elementMetadata = new("DataFrame" , rownames = NULL , nrows = 4L , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak = c(111L, 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", "epeak", "over")) , elementType = "ANY" , elementMetadata = NULL , metadata = list() ) , seqinfo = new("Seqinfo" , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY") , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_) , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA) , genome = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ ) ) , metadata = list() ) > dput (gr1) new("GRanges" , seqnames = new("Rle" , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY"), class = "factor") , lengths = 4L , elementMetadata = NULL , metadata = list() ) , ranges = new("IRanges" , start = c(540823L, 752809L, 771015L, 773361L) , width = c(221L, 230L, 520L, 247L) , NAMES = NULL , elementType = "integer" , elementMetadata = NULL , metadata = list() ) , strand = new("Rle" , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") , lengths = 4L , elementMetadata = NULL , metadata = list() ) , elementMetadata = new("DataFrame" , rownames = NULL , nrows = 4L , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak = c(111L, 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", "epeak", "over")) , elementType = "ANY" , elementMetadata = NULL , metadata = list() ) , seqinfo = new("Seqinfo" , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY") , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_) , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA) , genome = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ ) ) , metadata = list() ) > gr1 GRanges with 4 ranges and 3 metadata columns: seqnames ranges strand | edensity epeak over <rle> <iranges> <rle> | <integer> <integer> <integer> [1] chr1 [540823, 541043] * | 589 111 0 [2] chr1 [752809, 753038] * | 574 89 1 [3] chr1 [771015, 771534] * | 578 234 1 [4] chr1 [773361, 773607] * | 571 136 1 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX chrY NA NA NA NA NA NA ... NA NA NA NA NA NA > gr2 GRanges with 3 ranges and 3 metadata columns: seqnames ranges strand | edensity epeak over <rle> <iranges> <rle> | <integer> <integer> <integer> [1] chr1 [ 53141, 53685] * | 601 212 0 [2] chr1 [521536, 521622] * | 568 37 1 [3] chr1 [805002, 805650] * | 1000 326 1 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX chrY NA NA NA NA NA NA ... NA NA NA NA NA NA [[alternative HTML version deleted]]

• 1.4k views

ADD COMMENT • link updated 11.1 years ago by Harris A. Jaffee ▴ 590 • written 11.1 years ago by Hermann Norpois ▴ 170

0

Entering edit mode

Harris A. Jaffee ▴ 590

@harris-a-jaffee-3972

Last seen 9.5 years ago

United States

You want precede() and follow(), and then distance(), but tread somewhat carefully with strands of "*": > x=IRanges(2, 2) > x IRanges of length 1 start end width [1] 2 2 1 > y=IRanges(c(1,3), c(1,3)) > y IRanges of length 2 start end width [1] 1 1 1 [2] 3 3 1 > precede(x, y) [1] 2 > follow(x, y) [1] 1 > X = GRanges(ranges=x, seqnames="chr1") > Y = GRanges(ranges=y, seqnames="chr1") > X GRanges with 1 range and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [2, 2] * --- seqlengths: chr1 NA > Y GRanges with 2 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1, 1] * [2] chr1 [3, 3] * --- seqlengths: chr1 NA > precede(X, Y) [1] 1 > follow(X, Y) [1] 1 > precede(X, Y, ignore.strand=TRUE) [1] 2 > follow(X, Y, ignore.strand=TRUE) [1] 1 On Mar 3, 2013, at 4:10 PM, Hermann Norpois wrote: > Hello, > > I have to grange-objects - gr1 and gr2. And I wish to know: > 1) What are the neigbouring intervals of gr1 in gr2 - upstream and > downstream? > 2) What is the distance in bp between the neighbouring intervals? > > Could anybody give me a hint how this is done? > Thanks > Hermann > >> dput (gr1) > new("GRanges" > , seqnames = new("Rle" > , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", > "chr13", > "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", > "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", > "chr8", "chr9", "chrX", "chrY"), class = "factor") > , lengths = 4L > , elementMetadata = NULL > , metadata = list() > ) > , ranges = new("IRanges" > , start = c(540823L, 752809L, 771015L, 773361L) > , width = c(221L, 230L, 520L, 247L) > , NAMES = NULL > , elementType = "integer" > , elementMetadata = NULL > , metadata = list() > ) > , strand = new("Rle" > , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") > , lengths = 4L > , elementMetadata = NULL > , metadata = list() > ) > , elementMetadata = new("DataFrame" > , rownames = NULL > , nrows = 4L > , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak > = c(111L, > 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", > "epeak", "over")) > , elementType = "ANY" > , elementMetadata = NULL > , metadata = list() > ) > , seqinfo = new("Seqinfo" > , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", > "chr15", > "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", > "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", > "chrX", "chrY") > , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_) > , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA) > , genome = c(NA_character_, NA_character_, NA_character_, > NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ > ) > ) > , metadata = list() > ) >> dput (gr1) > new("GRanges" > , seqnames = new("Rle" > , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", > "chr13", > "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", > "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", > "chr8", "chr9", "chrX", "chrY"), class = "factor") > , lengths = 4L > , elementMetadata = NULL > , metadata = list() > ) > , ranges = new("IRanges" > , start = c(540823L, 752809L, 771015L, 773361L) > , width = c(221L, 230L, 520L, 247L) > , NAMES = NULL > , elementType = "integer" > , elementMetadata = NULL > , metadata = list() > ) > , strand = new("Rle" > , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") > , lengths = 4L > , elementMetadata = NULL > , metadata = list() > ) > , elementMetadata = new("DataFrame" > , rownames = NULL > , nrows = 4L > , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak > = c(111L, > 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", > "epeak", "over")) > , elementType = "ANY" > , elementMetadata = NULL > , metadata = list() > ) > , seqinfo = new("Seqinfo" > , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", > "chr15", > "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", > "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", > "chrX", "chrY") > , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_) > , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA) > , genome = c(NA_character_, NA_character_, NA_character_, > NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ > ) > ) > , metadata = list() > ) > >> gr1 > GRanges with 4 ranges and 3 metadata columns: > seqnames ranges strand | edensity epeak over > <rle> <iranges> <rle> | <integer> <integer> <integer> > [1] chr1 [540823, 541043] * | 589 111 0 > [2] chr1 [752809, 753038] * | 574 89 1 > [3] chr1 [771015, 771534] * | 578 234 1 > [4] chr1 [773361, 773607] * | 571 136 1 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX > chrY > NA NA NA NA NA NA ... NA NA NA NA NA > NA >> gr2 > GRanges with 3 ranges and 3 metadata columns: > seqnames ranges strand | edensity epeak over > <rle> <iranges> <rle> | <integer> <integer> <integer> > [1] chr1 [ 53141, 53685] * | 601 212 0 > [2] chr1 [521536, 521622] * | 568 37 1 > [3] chr1 [805002, 805650] * | 1000 326 1 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX > chrY > NA NA NA NA NA NA ... NA NA NA NA NA > NA > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.1 years ago Harris A. Jaffee ▴ 590

0

Entering edit mode

Hi Hermann, Harris, I wanted to give some background on the behavior of these functions for GRanges vs IRanges. > gr <- GRanges("chr1", IRanges(c(1,2,3), width=1)) > gr GRanges with 3 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1, 1] * [2] chr1 [2, 2] * [3] chr1 [3, 3] * ----------------------------- "+" strand behavior: ----------------------------- IRanges objects have no strand so you can expect them to behave as "+" strand GRanges do. If this is ever not the case please let me know. That would be a bug. > strand(gr) <- "+" > precede(gr) [1] 2 3 NA > precede(ranges(gr)) [1] 2 3 NA ----------------------------- "-" strand behavior: ----------------------------- I think it makes intuitive sense that precede() and follow() on "-" strand GRanges will give the opposite results from precede() and follow() on "+" strand GRanges. > strand(gr) <- "-" > precede(gr) [1] NA 1 2 ----------------------------- "*" strand behavior: ----------------------------- For "*" strand GRanges maybe we tried to be too clever. When ignore.strand=TRUE strand is treated as "+": > strand(gr) <- "*" > precede(gr, ignore.strand=TRUE) [1] 2 3 NA When ignore.strand=FALSE for a "*" strand range, results are computed for both "+' and "-". If one result is NA, the other is taken. This is how we end up with a '2' in positions 1 and 3 (position 3 was NA for "+" and position 1 was NA for "-"). If there is a tie, the first in order is taken. This is how we get the '1' in position 2 (tie was between '1' and '3' and '1' comes before '3' in order). > precede(gr, ignore.strand=FALSE) [1] 2 1 2 It was a challenge to decide how "*" strand with ignore.strand=FALSE should behave. If others have opinions on this I'd be interested in hearing them. Valerie On 03/03/13 16:38, Harris A. Jaffee wrote: > You want precede() and follow(), and then distance(), but tread somewhat carefully with strands of "*": > >> x=IRanges(2, 2) >> x > IRanges of length 1 > start end width > [1] 2 2 1 > >> y=IRanges(c(1,3), c(1,3)) >> y > IRanges of length 2 > start end width > [1] 1 1 1 > [2] 3 3 1 > >> precede(x, y) > [1] 2 >> follow(x, y) > [1] 1 > >> X = GRanges(ranges=x, seqnames="chr1") >> Y = GRanges(ranges=y, seqnames="chr1") >> X > GRanges with 1 range and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr1 [2, 2] * > --- > seqlengths: > chr1 > NA >> Y > GRanges with 2 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr1 [1, 1] * > [2] chr1 [3, 3] * > --- > seqlengths: > chr1 > NA > >> precede(X, Y) > [1] 1 >> follow(X, Y) > [1] 1 > >> precede(X, Y, ignore.strand=TRUE) > [1] 2 >> follow(X, Y, ignore.strand=TRUE) > [1] 1 > > > On Mar 3, 2013, at 4:10 PM, Hermann Norpois wrote: > >> Hello, >> >> I have to grange-objects - gr1 and gr2. And I wish to know: >> 1) What are the neigbouring intervals of gr1 in gr2 - upstream and >> downstream? >> 2) What is the distance in bp between the neighbouring intervals? >> >> Could anybody give me a hint how this is done? >> Thanks >> Hermann >> >>> dput (gr1) >> new("GRanges" >> , seqnames = new("Rle" >> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", >> "chr13", >> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >> "chr8", "chr9", "chrX", "chrY"), class = "factor") >> , lengths = 4L >> , elementMetadata = NULL >> , metadata = list() >> ) >> , ranges = new("IRanges" >> , start = c(540823L, 752809L, 771015L, 773361L) >> , width = c(221L, 230L, 520L, 247L) >> , NAMES = NULL >> , elementType = "integer" >> , elementMetadata = NULL >> , metadata = list() >> ) >> , strand = new("Rle" >> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >> , lengths = 4L >> , elementMetadata = NULL >> , metadata = list() >> ) >> , elementMetadata = new("DataFrame" >> , rownames = NULL >> , nrows = 4L >> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak >> = c(111L, >> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >> "epeak", "over")) >> , elementType = "ANY" >> , elementMetadata = NULL >> , metadata = list() >> ) >> , seqinfo = new("Seqinfo" >> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >> "chr15", >> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >> "chrX", "chrY") >> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> NA, NA, >> NA, NA, NA, NA, NA, NA, NA, NA, NA) >> , genome = c(NA_character_, NA_character_, NA_character_, >> NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >> ) >> ) >> , metadata = list() >> ) >>> dput (gr1) >> new("GRanges" >> , seqnames = new("Rle" >> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", >> "chr13", >> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >> "chr8", "chr9", "chrX", "chrY"), class = "factor") >> , lengths = 4L >> , elementMetadata = NULL >> , metadata = list() >> ) >> , ranges = new("IRanges" >> , start = c(540823L, 752809L, 771015L, 773361L) >> , width = c(221L, 230L, 520L, 247L) >> , NAMES = NULL >> , elementType = "integer" >> , elementMetadata = NULL >> , metadata = list() >> ) >> , strand = new("Rle" >> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >> , lengths = 4L >> , elementMetadata = NULL >> , metadata = list() >> ) >> , elementMetadata = new("DataFrame" >> , rownames = NULL >> , nrows = 4L >> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak >> = c(111L, >> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >> "epeak", "over")) >> , elementType = "ANY" >> , elementMetadata = NULL >> , metadata = list() >> ) >> , seqinfo = new("Seqinfo" >> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >> "chr15", >> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >> "chrX", "chrY") >> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> NA, NA, >> NA, NA, NA, NA, NA, NA, NA, NA, NA) >> , genome = c(NA_character_, NA_character_, NA_character_, >> NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >> ) >> ) >> , metadata = list() >> ) >> >>> gr1 >> GRanges with 4 ranges and 3 metadata columns: >> seqnames ranges strand | edensity epeak over >> <rle> <iranges> <rle> | <integer> <integer> <integer> >> [1] chr1 [540823, 541043] * | 589 111 0 >> [2] chr1 [752809, 753038] * | 574 89 1 >> [3] chr1 [771015, 771534] * | 578 234 1 >> [4] chr1 [773361, 773607] * | 571 136 1 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >> chrY >> NA NA NA NA NA NA ... NA NA NA NA NA >> NA >>> gr2 >> GRanges with 3 ranges and 3 metadata columns: >> seqnames ranges strand | edensity epeak over >> <rle> <iranges> <rle> | <integer> <integer> <integer> >> [1] chr1 [ 53141, 53685] * | 601 212 0 >> [2] chr1 [521536, 521622] * | 568 37 1 >> [3] chr1 [805002, 805650] * | 1000 326 1 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >> chrY >> NA NA NA NA NA NA ... NA NA NA NA NA >> NA >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives:http://news.gmane.org/gmane.science.biology.inf ormatics.conductor > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives:http://news.gmane.org/gmane.science.biology.info rmatics.conductor

ADD REPLY • link 11.1 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

On Mon, Mar 4, 2013 at 9:17 AM, Valerie Obenchain <vobencha@fhcrc.org>wrote: > Hi Hermann, Harris, > > I wanted to give some background on the behavior of these functions for > GRanges vs IRanges. > > > gr <- GRanges("chr1", IRanges(c(1,2,3), width=1)) > > gr > > GRanges with 3 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr1 [1, 1] * > [2] chr1 [2, 2] * > [3] chr1 [3, 3] * > > ----------------------------- > "+" strand behavior: > ----------------------------- > IRanges objects have no strand so you can expect them to behave as "+" > strand GRanges do. If this is ever not the case please let me know. That > would be a bug. > > > strand(gr) <- "+" > > precede(gr) > [1] 2 3 NA > > precede(ranges(gr)) > [1] 2 3 NA > > ----------------------------- > "-" strand behavior: > ----------------------------- > I think it makes intuitive sense that precede() and follow() on "-" strand > GRanges will give the opposite results from precede() and follow() on "+" > strand GRanges. > > > strand(gr) <- "-" > > precede(gr) > [1] NA 1 2 > > ----------------------------- > "*" strand behavior: > ----------------------------- > For "*" strand GRanges maybe we tried to be too clever. When > ignore.strand=TRUE strand is treated as "+": > > strand(gr) <- "*" > > precede(gr, ignore.strand=TRUE) > [1] 2 3 NA > > When ignore.strand=FALSE for a "*" strand range, results are computed for > both "+' and "-". If one result is NA, the other is taken. This is how we > end up with a '2' in positions 1 and 3 (position 3 was NA for "+" and > position 1 was NA for "-"). If there is a tie, the first in order is taken. > This is how we get the '1' in position 2 (tie was between '1' and '3' and > '1' comes before '3' in order). > > precede(gr, ignore.strand=FALSE) > [1] 2 1 2 > > It was a challenge to decide how "*" strand with ignore.strand=FALSE > should behave. If others have opinions on this I'd be interested in hearing > them. > > So it seems that "*" in GenomicRanges can mean at least two things: "either" or "missing". I think at one point these were represented by different values, but now the meanings have been merged for simplicity. Instead, we need to use an extra parameter, in this case ignore.strand, to distinguish the two meanings during an operation. It's often convenient to be able to change the meaning on a per-operation basis, but there is also a desire to keep the meaning in the data structure. Anyway, all this has been decided long ago and it probably does not make sense to change anything now. There is some concern though about the usability of the API. Usually, a "*" is used to select the natural/IRanges behavior, but the default of ignore.strand is FALSE, so the "either" meaning is assumed. This is going to surprise users. The same is true of ignore.strand in findOverlaps: many users are surprised to find that by default 'strand' separates the ranges into two coordinate systems, instead of just indicating direction. Michael > Valerie > > > > > > > On 03/03/13 16:38, Harris A. Jaffee wrote: > >> You want precede() and follow(), and then distance(), but tread somewhat >> carefully with strands of "*": >> >> x=IRanges(2, 2) >>> x >>> >> IRanges of length 1 >> start end width >> [1] 2 2 1 >> >> y=IRanges(c(1,3), c(1,3)) >>> y >>> >> IRanges of length 2 >> start end width >> [1] 1 1 1 >> [2] 3 3 1 >> >> precede(x, y) >>> >> [1] 2 >> >>> follow(x, y) >>> >> [1] 1 >> >> X = GRanges(ranges=x, seqnames="chr1") >>> Y = GRanges(ranges=y, seqnames="chr1") >>> X >>> >> GRanges with 1 range and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> [1] chr1 [2, 2] * >> --- >> seqlengths: >> chr1 >> NA >> >>> Y >>> >> GRanges with 2 ranges and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> [1] chr1 [1, 1] * >> [2] chr1 [3, 3] * >> --- >> seqlengths: >> chr1 >> NA >> >> precede(X, Y) >>> >> [1] 1 >> >>> follow(X, Y) >>> >> [1] 1 >> >> precede(X, Y, ignore.strand=TRUE) >>> >> [1] 2 >> >>> follow(X, Y, ignore.strand=TRUE) >>> >> [1] 1 >> >> >> On Mar 3, 2013, at 4:10 PM, Hermann Norpois wrote: >> >> Hello, >>> >>> I have to grange-objects - gr1 and gr2. And I wish to know: >>> 1) What are the neigbouring intervals of gr1 in gr2 - upstream and >>> downstream? >>> 2) What is the distance in bp between the neighbouring intervals? >>> >>> Could anybody give me a hint how this is done? >>> Thanks >>> Hermann >>> >>> dput (gr1) >>>> >>> new("GRanges" >>> , seqnames = new("Rle" >>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>> "chr12", >>> "chr13", >>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>> , lengths = 4L >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , ranges = new("IRanges" >>> , start = c(540823L, 752809L, 771015L, 773361L) >>> , width = c(221L, 230L, 520L, 247L) >>> , NAMES = NULL >>> , elementType = "integer" >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , strand = new("Rle" >>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >>> , lengths = 4L >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , elementMetadata = new("DataFrame" >>> , rownames = NULL >>> , nrows = 4L >>> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), >>> epeak >>> = c(111L, >>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >>> "epeak", "over")) >>> , elementType = "ANY" >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , seqinfo = new("Seqinfo" >>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >>> "chr15", >>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>> "chrX", "chrY") >>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> NA, NA, >>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>> , genome = c(NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >>> ) >>> ) >>> , metadata = list() >>> ) >>> >>>> dput (gr1) >>>> >>> new("GRanges" >>> , seqnames = new("Rle" >>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>> "chr12", >>> "chr13", >>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>> , lengths = 4L >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , ranges = new("IRanges" >>> , start = c(540823L, 752809L, 771015L, 773361L) >>> , width = c(221L, 230L, 520L, 247L) >>> , NAMES = NULL >>> , elementType = "integer" >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , strand = new("Rle" >>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >>> , lengths = 4L >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , elementMetadata = new("DataFrame" >>> , rownames = NULL >>> , nrows = 4L >>> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), >>> epeak >>> = c(111L, >>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >>> "epeak", "over")) >>> , elementType = "ANY" >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , seqinfo = new("Seqinfo" >>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >>> "chr15", >>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>> "chrX", "chrY") >>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> NA, NA, >>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>> , genome = c(NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >>> ) >>> ) >>> , metadata = list() >>> ) >>> >>> gr1 >>>> >>> GRanges with 4 ranges and 3 metadata columns: >>> seqnames ranges strand | edensity epeak over >>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>> [1] chr1 [540823, 541043] * | 589 111 0 >>> [2] chr1 [752809, 753038] * | 574 89 1 >>> [3] chr1 [771015, 771534] * | 578 234 1 >>> [4] chr1 [773361, 773607] * | 571 136 1 >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >>> chrY >>> NA NA NA NA NA NA ... NA NA NA NA NA >>> NA >>> >>>> gr2 >>>> >>> GRanges with 3 ranges and 3 metadata columns: >>> seqnames ranges strand | edensity epeak over >>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>> [1] chr1 [ 53141, 53685] * | 601 212 0 >>> [2] chr1 [521536, 521622] * | 568 37 1 >>> [3] chr1 [805002, 805650] * | 1000 326 1 >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >>> chrY >>> NA NA NA NA NA NA ... NA NA NA NA NA >>> NA >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives:http://news.gmane.**org/gmane.science.biology.** >>> informatics.conductor<http: news.gmane.org="" gmane.science.biology.="" informatics.conductor=""> >>> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives:http://news.gmane.**org/gmane.science.biology.** >> informatics.conductor<http: news.gmane.org="" gmane.science.biology.i="" nformatics.conductor=""> >> > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Hi, FWIW I also find it confusing that (d) gives something different: subject <- GRanges("chr1", IRanges(1:3, width=1)) x <- subject[2] (a): + / + strand(x) <- strand(subject) <- "+" > precede(x, subject) [1] 3 > follow(x, subject) [1] 1 (b): + / * strand(x) <- "+" strand(subject) <- "*" > precede(x, subject) [1] 3 > follow(x, subject) [1] 1 (c): * / + strand(x) <- "*" strand(subject) <- "+" > precede(x, subject) [1] 3 > follow(x, subject) [1] 1 (d): * / * strand(x) <- strand(subject) <- "*" > precede(x, subject) [1] 1 > follow(x, subject) [1] 1 What could be the motivation behind this? Wouldn't things be more consistent and less confusing if (a), (b), (c), and (d) all behaved consistently with (e): > precede(ranges(x), ranges(subject)) [1] 3 > follow(ranges(x), ranges(subject)) [1] 1 Thanks, H. On 03/04/2013 09:57 AM, Michael Lawrence wrote: > On Mon, Mar 4, 2013 at 9:17 AM, Valerie Obenchain <vobencha at="" fhcrc.org="">wrote: > >> Hi Hermann, Harris, >> >> I wanted to give some background on the behavior of these functions for >> GRanges vs IRanges. >> >>> gr <- GRanges("chr1", IRanges(c(1,2,3), width=1)) >>> gr >> >> GRanges with 3 ranges and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> [1] chr1 [1, 1] * >> [2] chr1 [2, 2] * >> [3] chr1 [3, 3] * >> >> ----------------------------- >> "+" strand behavior: >> ----------------------------- >> IRanges objects have no strand so you can expect them to behave as "+" >> strand GRanges do. If this is ever not the case please let me know. That >> would be a bug. >> >>> strand(gr) <- "+" >>> precede(gr) >> [1] 2 3 NA >>> precede(ranges(gr)) >> [1] 2 3 NA >> >> ----------------------------- >> "-" strand behavior: >> ----------------------------- >> I think it makes intuitive sense that precede() and follow() on "-" strand >> GRanges will give the opposite results from precede() and follow() on "+" >> strand GRanges. >> >>> strand(gr) <- "-" >>> precede(gr) >> [1] NA 1 2 >> >> ----------------------------- >> "*" strand behavior: >> ----------------------------- >> For "*" strand GRanges maybe we tried to be too clever. When >> ignore.strand=TRUE strand is treated as "+": >>> strand(gr) <- "*" >>> precede(gr, ignore.strand=TRUE) >> [1] 2 3 NA >> >> When ignore.strand=FALSE for a "*" strand range, results are computed for >> both "+' and "-". If one result is NA, the other is taken. This is how we >> end up with a '2' in positions 1 and 3 (position 3 was NA for "+" and >> position 1 was NA for "-"). If there is a tie, the first in order is taken. >> This is how we get the '1' in position 2 (tie was between '1' and '3' and >> '1' comes before '3' in order). >>> precede(gr, ignore.strand=FALSE) >> [1] 2 1 2 >> >> It was a challenge to decide how "*" strand with ignore.strand=FALSE >> should behave. If others have opinions on this I'd be interested in hearing >> them. >> >> > > So it seems that "*" in GenomicRanges can mean at least two things: > "either" or "missing". I think at one point these were represented by > different values, but now the meanings have been merged for simplicity. > Instead, we need to use an extra parameter, in this case ignore.strand, to > distinguish the two meanings during an operation. It's often convenient to > be able to change the meaning on a per-operation basis, but there is also a > desire to keep the meaning in the data structure. Anyway, all this has been > decided long ago and it probably does not make sense to change anything now. > > There is some concern though about the usability of the API. Usually, a "*" > is used to select the natural/IRanges behavior, but the default of > ignore.strand is FALSE, so the "either" meaning is assumed. This is going > to surprise users. The same is true of ignore.strand in findOverlaps: many > users are surprised to find that by default 'strand' separates the ranges > into two coordinate systems, instead of just indicating direction. > > Michael > > >> Valerie >> >> >> >> >> >> >> On 03/03/13 16:38, Harris A. Jaffee wrote: >> >>> You want precede() and follow(), and then distance(), but tread somewhat >>> carefully with strands of "*": >>> >>> x=IRanges(2, 2) >>>> x >>>> >>> IRanges of length 1 >>> start end width >>> [1] 2 2 1 >>> >>> y=IRanges(c(1,3), c(1,3)) >>>> y >>>> >>> IRanges of length 2 >>> start end width >>> [1] 1 1 1 >>> [2] 3 3 1 >>> >>> precede(x, y) >>>> >>> [1] 2 >>> >>>> follow(x, y) >>>> >>> [1] 1 >>> >>> X = GRanges(ranges=x, seqnames="chr1") >>>> Y = GRanges(ranges=y, seqnames="chr1") >>>> X >>>> >>> GRanges with 1 range and 0 metadata columns: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] chr1 [2, 2] * >>> --- >>> seqlengths: >>> chr1 >>> NA >>> >>>> Y >>>> >>> GRanges with 2 ranges and 0 metadata columns: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] chr1 [1, 1] * >>> [2] chr1 [3, 3] * >>> --- >>> seqlengths: >>> chr1 >>> NA >>> >>> precede(X, Y) >>>> >>> [1] 1 >>> >>>> follow(X, Y) >>>> >>> [1] 1 >>> >>> precede(X, Y, ignore.strand=TRUE) >>>> >>> [1] 2 >>> >>>> follow(X, Y, ignore.strand=TRUE) >>>> >>> [1] 1 >>> >>> >>> On Mar 3, 2013, at 4:10 PM, Hermann Norpois wrote: >>> >>> Hello, >>>> >>>> I have to grange-objects - gr1 and gr2. And I wish to know: >>>> 1) What are the neigbouring intervals of gr1 in gr2 - upstream and >>>> downstream? >>>> 2) What is the distance in bp between the neighbouring intervals? >>>> >>>> Could anybody give me a hint how this is done? >>>> Thanks >>>> Hermann >>>> >>>> dput (gr1) >>>>> >>>> new("GRanges" >>>> , seqnames = new("Rle" >>>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>>> "chr12", >>>> "chr13", >>>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>>> , lengths = 4L >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , ranges = new("IRanges" >>>> , start = c(540823L, 752809L, 771015L, 773361L) >>>> , width = c(221L, 230L, 520L, 247L) >>>> , NAMES = NULL >>>> , elementType = "integer" >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , strand = new("Rle" >>>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >>>> , lengths = 4L >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , elementMetadata = new("DataFrame" >>>> , rownames = NULL >>>> , nrows = 4L >>>> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), >>>> epeak >>>> = c(111L, >>>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >>>> "epeak", "over")) >>>> , elementType = "ANY" >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , seqinfo = new("Seqinfo" >>>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >>>> "chr15", >>>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>>> "chrX", "chrY") >>>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>>> NA, NA, >>>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>>> , genome = c(NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >>>> ) >>>> ) >>>> , metadata = list() >>>> ) >>>> >>>>> dput (gr1) >>>>> >>>> new("GRanges" >>>> , seqnames = new("Rle" >>>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>>> "chr12", >>>> "chr13", >>>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>>> , lengths = 4L >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , ranges = new("IRanges" >>>> , start = c(540823L, 752809L, 771015L, 773361L) >>>> , width = c(221L, 230L, 520L, 247L) >>>> , NAMES = NULL >>>> , elementType = "integer" >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , strand = new("Rle" >>>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >>>> , lengths = 4L >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , elementMetadata = new("DataFrame" >>>> , rownames = NULL >>>> , nrows = 4L >>>> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), >>>> epeak >>>> = c(111L, >>>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >>>> "epeak", "over")) >>>> , elementType = "ANY" >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , seqinfo = new("Seqinfo" >>>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >>>> "chr15", >>>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>>> "chrX", "chrY") >>>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>>> NA, NA, >>>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>>> , genome = c(NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >>>> ) >>>> ) >>>> , metadata = list() >>>> ) >>>> >>>> gr1 >>>>> >>>> GRanges with 4 ranges and 3 metadata columns: >>>> seqnames ranges strand | edensity epeak over >>>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>>> [1] chr1 [540823, 541043] * | 589 111 0 >>>> [2] chr1 [752809, 753038] * | 574 89 1 >>>> [3] chr1 [771015, 771534] * | 578 234 1 >>>> [4] chr1 [773361, 773607] * | 571 136 1 >>>> --- >>>> seqlengths: >>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >>>> chrY >>>> NA NA NA NA NA NA ... NA NA NA NA NA >>>> NA >>>> >>>>> gr2 >>>>> >>>> GRanges with 3 ranges and 3 metadata columns: >>>> seqnames ranges strand | edensity epeak over >>>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>>> [1] chr1 [ 53141, 53685] * | 601 212 0 >>>> [2] chr1 [521536, 521622] * | 568 37 1 >>>> [3] chr1 [805002, 805650] * | 1000 326 1 >>>> --- >>>> seqlengths: >>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >>>> chrY >>>> NA NA NA NA NA NA ... NA NA NA NA NA >>>> NA >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives:http://news.gmane.**org/gmane.science.biology.** >>>> informatics.conductor<http: news.gmane.org="" gmane.science.biology="" .informatics.conductor=""> >>>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives:http://news.gmane.**org/gmane.science.biology.** >>> informatics.conductor<http: news.gmane.org="" gmane.science.biology.="" informatics.conductor=""> >>> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 11.1 years ago Hervé Pagès 16k

0

Entering edit mode

Hi Herve, On 03/04/13 15:50, Hervé Pagès wrote: > Hi, > > FWIW I also find it confusing that (d) gives something different: > > subject <- GRanges("chr1", IRanges(1:3, width=1)) > x <- subject[2] > > (a): + / + > > strand(x) <- strand(subject) <- "+" > > > precede(x, subject) > [1] 3 > > follow(x, subject) > [1] 1 > > > (b): + / * > > strand(x) <- "+" > strand(subject) <- "*" > > > precede(x, subject) > [1] 3 > > follow(x, subject) > [1] 1 > > (c): * / + > > strand(x) <- "*" > strand(subject) <- "+" > > > precede(x, subject) > [1] 3 > > follow(x, subject) > [1] 1 > > (d): * / * > > strand(x) <- strand(subject) <- "*" > > > precede(x, subject) > [1] 1 > > follow(x, subject) > [1] 1 > > What could be the motivation behind this? Wouldn't things be more > consistent and less confusing if (a), (b), (c), and (d) all behaved > consistently with (e): This is the "maybe we tried to be too clever" part. (d) is consistent with (e) when ignore.strand=TRUE. By default ignore.strand=FALSE and these methods look for solutions on both/either strand. Val > > > precede(ranges(x), ranges(subject)) > [1] 3 > > follow(ranges(x), ranges(subject)) > [1] 1 > > Thanks, > H. > > > On 03/04/2013 09:57 AM, Michael Lawrence wrote: >> On Mon, Mar 4, 2013 at 9:17 AM, Valerie Obenchain >> <vobencha at="" fhcrc.org="">wrote: >> >>> Hi Hermann, Harris, >>> >>> I wanted to give some background on the behavior of these functions for >>> GRanges vs IRanges. >>> >>>> gr <- GRanges("chr1", IRanges(c(1,2,3), width=1)) >>>> gr >>> >>> GRanges with 3 ranges and 0 metadata columns: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] chr1 [1, 1] * >>> [2] chr1 [2, 2] * >>> [3] chr1 [3, 3] * >>> >>> ----------------------------- >>> "+" strand behavior: >>> ----------------------------- >>> IRanges objects have no strand so you can expect them to behave as "+" >>> strand GRanges do. If this is ever not the case please let me know. >>> That >>> would be a bug. >>> >>>> strand(gr) <- "+" >>>> precede(gr) >>> [1] 2 3 NA >>>> precede(ranges(gr)) >>> [1] 2 3 NA >>> >>> ----------------------------- >>> "-" strand behavior: >>> ----------------------------- >>> I think it makes intuitive sense that precede() and follow() on "-" >>> strand >>> GRanges will give the opposite results from precede() and follow() >>> on "+" >>> strand GRanges. >>> >>>> strand(gr) <- "-" >>>> precede(gr) >>> [1] NA 1 2 >>> >>> ----------------------------- >>> "*" strand behavior: >>> ----------------------------- >>> For "*" strand GRanges maybe we tried to be too clever. When >>> ignore.strand=TRUE strand is treated as "+": >>>> strand(gr) <- "*" >>>> precede(gr, ignore.strand=TRUE) >>> [1] 2 3 NA >>> >>> When ignore.strand=FALSE for a "*" strand range, results are >>> computed for >>> both "+' and "-". If one result is NA, the other is taken. This is >>> how we >>> end up with a '2' in positions 1 and 3 (position 3 was NA for "+" and >>> position 1 was NA for "-"). If there is a tie, the first in order is >>> taken. >>> This is how we get the '1' in position 2 (tie was between '1' and >>> '3' and >>> '1' comes before '3' in order). >>>> precede(gr, ignore.strand=FALSE) >>> [1] 2 1 2 >>> >>> It was a challenge to decide how "*" strand with ignore.strand=FALSE >>> should behave. If others have opinions on this I'd be interested in >>> hearing >>> them. >>> >>> >> >> So it seems that "*" in GenomicRanges can mean at least two things: >> "either" or "missing". I think at one point these were represented by >> different values, but now the meanings have been merged for simplicity. >> Instead, we need to use an extra parameter, in this case >> ignore.strand, to >> distinguish the two meanings during an operation. It's often >> convenient to >> be able to change the meaning on a per-operation basis, but there is >> also a >> desire to keep the meaning in the data structure. Anyway, all this >> has been >> decided long ago and it probably does not make sense to change >> anything now. >> >> There is some concern though about the usability of the API. Usually, >> a "*" >> is used to select the natural/IRanges behavior, but the default of >> ignore.strand is FALSE, so the "either" meaning is assumed. This is >> going >> to surprise users. The same is true of ignore.strand in findOverlaps: >> many >> users are surprised to find that by default 'strand' separates the >> ranges >> into two coordinate systems, instead of just indicating direction. >> >> Michael >> >> >>> Valerie >>> >>> >>> >>> >>> >>> >>> On 03/03/13 16:38, Harris A. Jaffee wrote: >>> >>>> You want precede() and follow(), and then distance(), but tread >>>> somewhat >>>> carefully with strands of "*": >>>> >>>> x=IRanges(2, 2) >>>>> x >>>>> >>>> IRanges of length 1 >>>> start end width >>>> [1] 2 2 1 >>>> >>>> y=IRanges(c(1,3), c(1,3)) >>>>> y >>>>> >>>> IRanges of length 2 >>>> start end width >>>> [1] 1 1 1 >>>> [2] 3 3 1 >>>> >>>> precede(x, y) >>>>> >>>> [1] 2 >>>> >>>>> follow(x, y) >>>>> >>>> [1] 1 >>>> >>>> X = GRanges(ranges=x, seqnames="chr1") >>>>> Y = GRanges(ranges=y, seqnames="chr1") >>>>> X >>>>> >>>> GRanges with 1 range and 0 metadata columns: >>>> seqnames ranges strand >>>> <rle> <iranges> <rle> >>>> [1] chr1 [2, 2] * >>>> --- >>>> seqlengths: >>>> chr1 >>>> NA >>>> >>>>> Y >>>>> >>>> GRanges with 2 ranges and 0 metadata columns: >>>> seqnames ranges strand >>>> <rle> <iranges> <rle> >>>> [1] chr1 [1, 1] * >>>> [2] chr1 [3, 3] * >>>> --- >>>> seqlengths: >>>> chr1 >>>> NA >>>> >>>> precede(X, Y) >>>>> >>>> [1] 1 >>>> >>>>> follow(X, Y) >>>>> >>>> [1] 1 >>>> >>>> precede(X, Y, ignore.strand=TRUE) >>>>> >>>> [1] 2 >>>> >>>>> follow(X, Y, ignore.strand=TRUE) >>>>> >>>> [1] 1 >>>> >>>> >>>> On Mar 3, 2013, at 4:10 PM, Hermann Norpois wrote: >>>> >>>> Hello, >>>>> >>>>> I have to grange-objects - gr1 and gr2. And I wish to know: >>>>> 1) What are the neigbouring intervals of gr1 in gr2 - upstream and >>>>> downstream? >>>>> 2) What is the distance in bp between the neighbouring intervals? >>>>> >>>>> Could anybody give me a hint how this is done? >>>>> Thanks >>>>> Hermann >>>>> >>>>> dput (gr1) >>>>>> >>>>> new("GRanges" >>>>> , seqnames = new("Rle" >>>>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>>>> "chr12", >>>>> "chr13", >>>>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>>>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>>>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>>>> , lengths = 4L >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , ranges = new("IRanges" >>>>> , start = c(540823L, 752809L, 771015L, 773361L) >>>>> , width = c(221L, 230L, 520L, 247L) >>>>> , NAMES = NULL >>>>> , elementType = "integer" >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , strand = new("Rle" >>>>> , values = structure(3L, .Label = c("+", "-", "*"), class = >>>>> "factor") >>>>> , lengths = 4L >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , elementMetadata = new("DataFrame" >>>>> , rownames = NULL >>>>> , nrows = 4L >>>>> , listData = structure(list(edensity = c(589L, 574L, 578L, >>>>> 571L), >>>>> epeak >>>>> = c(111L, >>>>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >>>>> "epeak", "over")) >>>>> , elementType = "ANY" >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , seqinfo = new("Seqinfo" >>>>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", >>>>> "chr14", >>>>> "chr15", >>>>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>>>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>>>> "chrX", "chrY") >>>>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, >>>>> NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>>>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>>>> NA, NA, >>>>> NA, NA, >>>>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>>>> , genome = c(NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_ >>>>> ) >>>>> ) >>>>> , metadata = list() >>>>> ) >>>>> >>>>>> dput (gr1) >>>>>> >>>>> new("GRanges" >>>>> , seqnames = new("Rle" >>>>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>>>> "chr12", >>>>> "chr13", >>>>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>>>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>>>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>>>> , lengths = 4L >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , ranges = new("IRanges" >>>>> , start = c(540823L, 752809L, 771015L, 773361L) >>>>> , width = c(221L, 230L, 520L, 247L) >>>>> , NAMES = NULL >>>>> , elementType = "integer" >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , strand = new("Rle" >>>>> , values = structure(3L, .Label = c("+", "-", "*"), class = >>>>> "factor") >>>>> , lengths = 4L >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , elementMetadata = new("DataFrame" >>>>> , rownames = NULL >>>>> , nrows = 4L >>>>> , listData = structure(list(edensity = c(589L, 574L, 578L, >>>>> 571L), >>>>> epeak >>>>> = c(111L, >>>>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >>>>> "epeak", "over")) >>>>> , elementType = "ANY" >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , seqinfo = new("Seqinfo" >>>>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", >>>>> "chr14", >>>>> "chr15", >>>>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>>>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>>>> "chrX", "chrY") >>>>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, >>>>> NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>>>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>>>> NA, NA, >>>>> NA, NA, >>>>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>>>> , genome = c(NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_ >>>>> ) >>>>> ) >>>>> , metadata = list() >>>>> ) >>>>> >>>>> gr1 >>>>>> >>>>> GRanges with 4 ranges and 3 metadata columns: >>>>> seqnames ranges strand | edensity epeak over >>>>> <rle> <iranges> <rle> | <integer> <integer> >>>>> <integer> >>>>> [1] chr1 [540823, 541043] * | 589 111 0 >>>>> [2] chr1 [752809, 753038] * | 574 89 1 >>>>> [3] chr1 [771015, 771534] * | 578 234 1 >>>>> [4] chr1 [773361, 773607] * | 571 136 1 >>>>> --- >>>>> seqlengths: >>>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 >>>>> chr9 chrX >>>>> chrY >>>>> NA NA NA NA NA NA ... NA NA NA >>>>> NA NA >>>>> NA >>>>> >>>>>> gr2 >>>>>> >>>>> GRanges with 3 ranges and 3 metadata columns: >>>>> seqnames ranges strand | edensity epeak over >>>>> <rle> <iranges> <rle> | <integer> <integer> >>>>> <integer> >>>>> [1] chr1 [ 53141, 53685] * | 601 212 0 >>>>> [2] chr1 [521536, 521622] * | 568 37 1 >>>>> [3] chr1 [805002, 805650] * | 1000 326 1 >>>>> --- >>>>> seqlengths: >>>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 >>>>> chr9 chrX >>>>> chrY >>>>> NA NA NA NA NA NA ... NA NA NA >>>>> NA NA >>>>> NA >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> >>>>> Search the archives:http://news.gmane.**org/gmane.science.biology.** >>>>> informatics.conductor<http: news.gmane.org="" gmane.science.biolog="" y.informatics.conductor=""> >>>>> >>>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> >>>> Search the archives:http://news.gmane.**org/gmane.science.biology.** >>>> informatics.conductor<http: news.gmane.org="" gmane.science.biology="" .informatics.conductor=""> >>>> >>>> >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >

ADD REPLY • link 11.1 years ago Valerie Obenchain ★ 6.8k

Login before adding your answer.