bug in pintersect in GenomicRanges
1
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 9 months ago
United States
Example library(GenomicRanges) gr1 = GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(start = c(1,5), width = 7)) gr2 = GRanges(seqnames = "chr1", ranges = IRanges(start = 1, end = 4)) pintersect(gr1, rep(gr2, 2)) GRanges with 2 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1, 4] * [2] chr1 [5, 4] * --- seqlengths: chr1 NA The second element in the return GRanges should be empty. I would therefore expect that the return length of the pintersect would have length 1, despite the fact that the first object has length 2. Kasper
• 1.3k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 5 weeks ago
United States
does this contradict the doc in some way? the second range of your result has width zero, which i think is correct. On Sun, Nov 11, 2012 at 12:16 PM, Kasper Daniel Hansen < kasperdanielhansen@gmail.com> wrote: > Example > > library(GenomicRanges) > gr1 = GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(start = > c(1,5), width = 7)) > gr2 = GRanges(seqnames = "chr1", ranges = IRanges(start = 1, end = 4)) > pintersect(gr1, rep(gr2, 2)) > > GRanges with 2 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr1 [1, 4] * > [2] chr1 [5, 4] * > --- > seqlengths: > chr1 > NA > > The second element in the return GRanges should be empty. I would > therefore expect that the return length of the pintersect would have > length 1, despite the fact that the first object has length 2. > > Kasper > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Well, I am surprised by the zero width IRanges with start and end. The choice of start = 5 and end = 4 is - as far as I can see - completely arbitrary. I also don't like the fact that start() and end() just reports 5 and 4 - I could see myself making mistakes in my code, because I believe I assume start <= end. But perhaps this is because it is now allowed to have zero-width IRanges? Kasper On Mon, Nov 12, 2012 at 6:20 AM, Vincent Carey <stvjc at="" channing.harvard.edu=""> wrote: > does this contradict the doc in some way? the second range of your result > has width zero, which i think is correct. > > On Sun, Nov 11, 2012 at 12:16 PM, Kasper Daniel Hansen > <kasperdanielhansen at="" gmail.com=""> wrote: >> >> Example >> >> library(GenomicRanges) >> gr1 = GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(start = >> c(1,5), width = 7)) >> gr2 = GRanges(seqnames = "chr1", ranges = IRanges(start = 1, end = 4)) >> pintersect(gr1, rep(gr2, 2)) >> >> GRanges with 2 ranges and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> [1] chr1 [1, 4] * >> [2] chr1 [5, 4] * >> --- >> seqlengths: >> chr1 >> NA >> >> The second element in the return GRanges should be empty. I would >> therefore expect that the return length of the pintersect would have >> length 1, despite the fact that the first object has length 2. >> >> Kasper >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY
0
Entering edit mode
The end = start - 1 encoding for zero-width Ranges objects has existed as long as the IRanges package. Having end < start is obviously not ideal, but there are no clearly superior alternatives. Michael On Mon, Nov 12, 2012 at 8:58 AM, Kasper Daniel Hansen < kasperdanielhansen@gmail.com> wrote: > Well, I am surprised by the zero width IRanges with start and end. > The choice of start = 5 and end = 4 is - as far as I can see - > completely arbitrary. I also don't like the fact that start() and > end() just reports 5 and 4 - I could see myself making mistakes in my > code, because I believe I assume start <= end. > > But perhaps this is because it is now allowed to have zero-width IRanges? > > Kasper > > > On Mon, Nov 12, 2012 at 6:20 AM, Vincent Carey > <stvjc@channing.harvard.edu> wrote: > > does this contradict the doc in some way? the second range of your > result > > has width zero, which i think is correct. > > > > On Sun, Nov 11, 2012 at 12:16 PM, Kasper Daniel Hansen > > <kasperdanielhansen@gmail.com> wrote: > >> > >> Example > >> > >> library(GenomicRanges) > >> gr1 = GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(start = > >> c(1,5), width = 7)) > >> gr2 = GRanges(seqnames = "chr1", ranges = IRanges(start = 1, end = 4)) > >> pintersect(gr1, rep(gr2, 2)) > >> > >> GRanges with 2 ranges and 0 metadata columns: > >> seqnames ranges strand > >> <rle> <iranges> <rle> > >> [1] chr1 [1, 4] * > >> [2] chr1 [5, 4] * > >> --- > >> seqlengths: > >> chr1 > >> NA > >> > >> The second element in the return GRanges should be empty. I would > >> therefore expect that the return length of the pintersect would have > >> length 1, despite the fact that the first object has length 2. > >> > >> Kasper > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
> The end = start - 1 encoding for zero-width Ranges objects has existed as > long as the IRanges package. Having end < start is obviously not ideal, but > there are no clearly superior alternatives. Hi, Was it ever considered allowing two out of three start end width equal to NA, to mean 'unknown' or 'unspecified'? We could have IRanges of a known width, but an unknown start or stop. Or a GRanges of a known chomosome and start and strand, but unknown stop or width. Or a known width but unknown start or stop. I'm sure allowing such would wreck havoc in a million other places. Just wondering if the conversation was had and how it went.... ;) ~Malcolm > > Michael > > On Mon, Nov 12, 2012 at 8:58 AM, Kasper Daniel Hansen < > kasperdanielhansen at gmail.com> wrote: > > > Well, I am surprised by the zero width IRanges with start and end. > > The choice of start = 5 and end = 4 is - as far as I can see - > > completely arbitrary. I also don't like the fact that start() and > > end() just reports 5 and 4 - I could see myself making mistakes in my > > code, because I believe I assume start <= end. > > > > But perhaps this is because it is now allowed to have zero-width IRanges? > > > > Kasper > > > > > > On Mon, Nov 12, 2012 at 6:20 AM, Vincent Carey > > <stvjc at="" channing.harvard.edu=""> wrote: > > > does this contradict the doc in some way? the second range of your > > result > > > has width zero, which i think is correct. > > > > > > On Sun, Nov 11, 2012 at 12:16 PM, Kasper Daniel Hansen > > > <kasperdanielhansen at="" gmail.com=""> wrote: > > >> > > >> Example > > >> > > >> library(GenomicRanges) > > >> gr1 = GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(start = > > >> c(1,5), width = 7)) > > >> gr2 = GRanges(seqnames = "chr1", ranges = IRanges(start = 1, end = 4)) > > >> pintersect(gr1, rep(gr2, 2)) > > >> > > >> GRanges with 2 ranges and 0 metadata columns: > > >> seqnames ranges strand > > >> <rle> <iranges> <rle> > > >> [1] chr1 [1, 4] * > > >> [2] chr1 [5, 4] * > > >> --- > > >> seqlengths: > > >> chr1 > > >> NA > > >> > > >> The second element in the return GRanges should be empty. I would > > >> therefore expect that the return length of the pintersect would have > > >> length 1, despite the fact that the first object has length 2. > > >> > > >> Kasper > > >> > > >> _______________________________________________ > > >> Bioconductor mailing list > > >> Bioconductor at r-project.org > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > >> Search the archives: > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On Mon, Nov 12, 2012 at 10:54 AM, Cook, Malcolm <mec@stowers.org> wrote: > > The end = start - 1 encoding for zero-width Ranges objects has existed as > > long as the IRanges package. Having end < start is obviously not ideal, > but > > there are no clearly superior alternatives. > > Hi, > > Was it ever considered allowing two out of three start end width equal to > NA, to mean 'unknown' or 'unspecified'? > > We could have IRanges of a known width, but an unknown start or stop. > > Or a GRanges of a known chomosome and start and strand, but unknown stop > or width. > > Or a known width but unknown start or stop. > > I'm sure allowing such would wreck havoc in a million other places. > > Just wondering if the conversation was had and how it went.... > > It was discussed, it was probably punted until there was a clear use case. This is a different issue from zero-width ranges, of course. Michael > ;) > > ~Malcolm > > > > > > Michael > > > > On Mon, Nov 12, 2012 at 8:58 AM, Kasper Daniel Hansen < > > kasperdanielhansen@gmail.com> wrote: > > > > > Well, I am surprised by the zero width IRanges with start and end. > > > The choice of start = 5 and end = 4 is - as far as I can see - > > > completely arbitrary. I also don't like the fact that start() and > > > end() just reports 5 and 4 - I could see myself making mistakes in my > > > code, because I believe I assume start <= end. > > > > > > But perhaps this is because it is now allowed to have zero-width > IRanges? > > > > > > Kasper > > > > > > > > > On Mon, Nov 12, 2012 at 6:20 AM, Vincent Carey > > > <stvjc@channing.harvard.edu> wrote: > > > > does this contradict the doc in some way? the second range of your > > > result > > > > has width zero, which i think is correct. > > > > > > > > On Sun, Nov 11, 2012 at 12:16 PM, Kasper Daniel Hansen > > > > <kasperdanielhansen@gmail.com> wrote: > > > >> > > > >> Example > > > >> > > > >> library(GenomicRanges) > > > >> gr1 = GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(start = > > > >> c(1,5), width = 7)) > > > >> gr2 = GRanges(seqnames = "chr1", ranges = IRanges(start = 1, end = > 4)) > > > >> pintersect(gr1, rep(gr2, 2)) > > > >> > > > >> GRanges with 2 ranges and 0 metadata columns: > > > >> seqnames ranges strand > > > >> <rle> <iranges> <rle> > > > >> [1] chr1 [1, 4] * > > > >> [2] chr1 [5, 4] * > > > >> --- > > > >> seqlengths: > > > >> chr1 > > > >> NA > > > >> > > > >> The second element in the return GRanges should be empty. I would > > > >> therefore expect that the return length of the pintersect would have > > > >> length 1, despite the fact that the first object has length 2. > > > >> > > > >> Kasper > > > >> > > > >> _______________________________________________ > > > >> Bioconductor mailing list > > > >> Bioconductor@r-project.org > > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >> Search the archives: > > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hmm, ok. Well, an alternative is to let the output of pintersect be shorter than the two inputs. Also, compare to intersect(gr1[2], gr2) which is empty. Kasper On Mon, Nov 12, 2012 at 1:46 PM, Michael Lawrence <lawrence.michael at="" gene.com=""> wrote: > The end = start - 1 encoding for zero-width Ranges objects has existed as > long as the IRanges package. Having end < start is obviously not ideal, but > there are no clearly superior alternatives. > > Michael > > > On Mon, Nov 12, 2012 at 8:58 AM, Kasper Daniel Hansen > <kasperdanielhansen at="" gmail.com=""> wrote: >> >> Well, I am surprised by the zero width IRanges with start and end. >> The choice of start = 5 and end = 4 is - as far as I can see - >> completely arbitrary. I also don't like the fact that start() and >> end() just reports 5 and 4 - I could see myself making mistakes in my >> code, because I believe I assume start <= end. >> >> But perhaps this is because it is now allowed to have zero-width IRanges? >> >> Kasper >> >> >> On Mon, Nov 12, 2012 at 6:20 AM, Vincent Carey >> <stvjc at="" channing.harvard.edu=""> wrote: >> > does this contradict the doc in some way? the second range of your >> > result >> > has width zero, which i think is correct. >> > >> > On Sun, Nov 11, 2012 at 12:16 PM, Kasper Daniel Hansen >> > <kasperdanielhansen at="" gmail.com=""> wrote: >> >> >> >> Example >> >> >> >> library(GenomicRanges) >> >> gr1 = GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(start = >> >> c(1,5), width = 7)) >> >> gr2 = GRanges(seqnames = "chr1", ranges = IRanges(start = 1, end = 4)) >> >> pintersect(gr1, rep(gr2, 2)) >> >> >> >> GRanges with 2 ranges and 0 metadata columns: >> >> seqnames ranges strand >> >> <rle> <iranges> <rle> >> >> [1] chr1 [1, 4] * >> >> [2] chr1 [5, 4] * >> >> --- >> >> seqlengths: >> >> chr1 >> >> NA >> >> >> >> The second element in the return GRanges should be empty. I would >> >> therefore expect that the return length of the pintersect would have >> >> length 1, despite the fact that the first object has length 2. >> >> >> >> Kasper >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY
0
Entering edit mode
On Mon, Nov 12, 2012 at 11:00 AM, Kasper Daniel Hansen < kasperdanielhansen@gmail.com> wrote: > Hmm, ok. > > Well, an alternative is to let the output of pintersect be shorter > than the two inputs. I would really prefer not to violate that contract of the p* functions. If there are any empty ranges, it's not easy to figure out how the results match up with the input. Michael > Also, compare to > intersect(gr1[2], gr2) > which is empty. > > Kasper > > > > On Mon, Nov 12, 2012 at 1:46 PM, Michael Lawrence > <lawrence.michael@gene.com> wrote: > > The end = start - 1 encoding for zero-width Ranges objects has existed as > > long as the IRanges package. Having end < start is obviously not ideal, > but > > there are no clearly superior alternatives. > > > > Michael > > > > > > On Mon, Nov 12, 2012 at 8:58 AM, Kasper Daniel Hansen > > <kasperdanielhansen@gmail.com> wrote: > >> > >> Well, I am surprised by the zero width IRanges with start and end. > >> The choice of start = 5 and end = 4 is - as far as I can see - > >> completely arbitrary. I also don't like the fact that start() and > >> end() just reports 5 and 4 - I could see myself making mistakes in my > >> code, because I believe I assume start <= end. > >> > >> But perhaps this is because it is now allowed to have zero-width > IRanges? > >> > >> Kasper > >> > >> > >> On Mon, Nov 12, 2012 at 6:20 AM, Vincent Carey > >> <stvjc@channing.harvard.edu> wrote: > >> > does this contradict the doc in some way? the second range of your > >> > result > >> > has width zero, which i think is correct. > >> > > >> > On Sun, Nov 11, 2012 at 12:16 PM, Kasper Daniel Hansen > >> > <kasperdanielhansen@gmail.com> wrote: > >> >> > >> >> Example > >> >> > >> >> library(GenomicRanges) > >> >> gr1 = GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(start = > >> >> c(1,5), width = 7)) > >> >> gr2 = GRanges(seqnames = "chr1", ranges = IRanges(start = 1, end = > 4)) > >> >> pintersect(gr1, rep(gr2, 2)) > >> >> > >> >> GRanges with 2 ranges and 0 metadata columns: > >> >> seqnames ranges strand > >> >> <rle> <iranges> <rle> > >> >> [1] chr1 [1, 4] * > >> >> [2] chr1 [5, 4] * > >> >> --- > >> >> seqlengths: > >> >> chr1 > >> >> NA > >> >> > >> >> The second element in the return GRanges should be empty. I would > >> >> therefore expect that the return length of the pintersect would have > >> >> length 1, despite the fact that the first object has length 2. > >> >> > >> >> Kasper > >> >> > >> >> _______________________________________________ > >> >> Bioconductor mailing list > >> >> Bioconductor@r-project.org > >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> Search the archives: > >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Mon, Nov 12, 2012 at 4:28 PM, Michael Lawrence <lawrence.michael at="" gene.com=""> wrote: > > > On Mon, Nov 12, 2012 at 11:00 AM, Kasper Daniel Hansen > <kasperdanielhansen at="" gmail.com=""> wrote: >> >> Hmm, ok. >> >> Well, an alternative is to let the output of pintersect be shorter >> than the two inputs. > > > I would really prefer not to violate that contract of the p* functions. If > there are any empty ranges, it's not easy to figure out how the results > match up with the input. I can see this point of view. But one could use findOverlaps. I am just advocating for my position which is (1) inconsistency with intersect() is not great and (2) I believe I could easily be caught by not noticing a zero-width IRanges. Anyway, reading ?pintersect very carefully, it does discuss these things in some detail, although the paragraph is not consistent with itself (as I read it, it start by saying that an error is thrown if an empty intersect is being produced, but then it contradicts itself later on). Reflecting a bit, I guess I would like a new option for resolve.empty="remove" Kasper > > Michael > >> >> Also, compare to >> intersect(gr1[2], gr2) >> which is empty. >> >> Kasper >> >> >> >> On Mon, Nov 12, 2012 at 1:46 PM, Michael Lawrence >> <lawrence.michael at="" gene.com=""> wrote: >> > The end = start - 1 encoding for zero-width Ranges objects has existed >> > as >> > long as the IRanges package. Having end < start is obviously not ideal, >> > but >> > there are no clearly superior alternatives. >> > >> > Michael >> > >> > >> > On Mon, Nov 12, 2012 at 8:58 AM, Kasper Daniel Hansen >> > <kasperdanielhansen at="" gmail.com=""> wrote: >> >> >> >> Well, I am surprised by the zero width IRanges with start and end. >> >> The choice of start = 5 and end = 4 is - as far as I can see - >> >> completely arbitrary. I also don't like the fact that start() and >> >> end() just reports 5 and 4 - I could see myself making mistakes in my >> >> code, because I believe I assume start <= end. >> >> >> >> But perhaps this is because it is now allowed to have zero-width >> >> IRanges? >> >> >> >> Kasper >> >> >> >> >> >> On Mon, Nov 12, 2012 at 6:20 AM, Vincent Carey >> >> <stvjc at="" channing.harvard.edu=""> wrote: >> >> > does this contradict the doc in some way? the second range of your >> >> > result >> >> > has width zero, which i think is correct. >> >> > >> >> > On Sun, Nov 11, 2012 at 12:16 PM, Kasper Daniel Hansen >> >> > <kasperdanielhansen at="" gmail.com=""> wrote: >> >> >> >> >> >> Example >> >> >> >> >> >> library(GenomicRanges) >> >> >> gr1 = GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(start = >> >> >> c(1,5), width = 7)) >> >> >> gr2 = GRanges(seqnames = "chr1", ranges = IRanges(start = 1, end = >> >> >> 4)) >> >> >> pintersect(gr1, rep(gr2, 2)) >> >> >> >> >> >> GRanges with 2 ranges and 0 metadata columns: >> >> >> seqnames ranges strand >> >> >> <rle> <iranges> <rle> >> >> >> [1] chr1 [1, 4] * >> >> >> [2] chr1 [5, 4] * >> >> >> --- >> >> >> seqlengths: >> >> >> chr1 >> >> >> NA >> >> >> >> >> >> The second element in the return GRanges should be empty. I would >> >> >> therefore expect that the return length of the pintersect would have >> >> >> length 1, despite the fact that the first object has length 2. >> >> >> >> >> >> Kasper >> >> >> >> >> >> _______________________________________________ >> >> >> Bioconductor mailing list >> >> >> Bioconductor at r-project.org >> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> Search the archives: >> >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >> >> > >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > > >
ADD REPLY

Login before adding your answer.

Traffic: 801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6