GenomicRanges::findOverlap unexpected behaviour with 0 width ranges
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
The hits returned when using findOverlap seem strange when a zero width range is involved. By default, findOverlap does not return any hits with a zero width range, even if other range contains the zero width range. Presumably this is due the requirement of a positive, non-zero minoverlap. If a maxgap is set, a hit is returned only if the zero width range is the query. According to the documentation, "all matches must additionally satisfy the minoverlap constraint". This constraint is violated when a zero- width range is passed as the query parameter with a non-zero maxgap. I presume this is a bug, is it not? rangeZero <- IRanges(5, 4) rangeContains <- IRanges(3, 6) rangeStartMatch <- IRanges(5, 5) findOverlaps(rangeZero, rangeContains) # no match findOverlaps(rangeZero, rangeStartMatch) # no match findOverlaps(rangeContains, rangeZero) # no match findOverlaps(rangeStartMatch, rangeZero) # no match findOverlaps(rangeZero, rangeContains, maxgap=1) # match findOverlaps(rangeZero, rangeStartMatch, maxgap=1) # match findOverlaps(rangeContains, rangeZero, maxgap=1) # no match findOverlaps(rangeStartMatch, rangeZero, maxgap=1) # no match -- output of sessionInfo(): R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C [5] LC_TIME=English_Australia.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.12.3 AnnotationDbi_1.22.6 Biobase_2.20.1 GenomicRanges_1.12.5 IRanges_1.18.3 BiocGenerics_0.6.0 [7] BiocInstaller_1.10.3 loaded via a namespace (and not attached): [1] biomaRt_2.16.0 Biostrings_2.28.0 bitops_1.0-6 BSgenome_1.28.0 DBI_0.2-7 RCurl_1.95-4.1 Rsamtools_1.12.4 RSQLite_0.11.4 [9] rtracklayer_1.20.4 stats4_3.0.1 tools_3.0.1 XML_3.98-1.1 zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.
• 821 views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
Hello, Thanks for reporting the bug. The behavior is fixed in IRanges 1.19.38 and 1.18.4. A zero-width range in 'query' no longer registers as a hit, regardless of 'maxgap' and 'minoverlap' values. > findOverlaps(rangeZero, rangeContains, maxgap=1) # match Hits of length 0 queryLength: 1 subjectLength: 1 > findOverlaps(rangeZero, rangeStartMatch, maxgap=1) # match Hits of length 0 queryLength: 1 subjectLength: 1 Valerie On 09/19/2013 10:54 PM, Maintainer wrote: > > The hits returned when using findOverlap seem strange when a zero width range is involved. By default, findOverlap does not return any hits with a zero width range, even if other range contains the zero width range. Presumably this is due the requirement of a positive, non-zero minoverlap. If a maxgap is set, a hit is returned only if the zero width range is the query. > > According to the documentation, "all matches must additionally satisfy the minoverlap constraint". This constraint is violated when a zero-width range is passed as the query parameter with a non-zero maxgap. I presume this is a bug, is it not? > > rangeZero <- IRanges(5, 4) > rangeContains <- IRanges(3, 6) > rangeStartMatch <- IRanges(5, 5) > findOverlaps(rangeZero, rangeContains) # no match > findOverlaps(rangeZero, rangeStartMatch) # no match > findOverlaps(rangeContains, rangeZero) # no match > findOverlaps(rangeStartMatch, rangeZero) # no match > findOverlaps(rangeZero, rangeContains, maxgap=1) # match > findOverlaps(rangeZero, rangeStartMatch, maxgap=1) # match > findOverlaps(rangeContains, rangeZero, maxgap=1) # no match > findOverlaps(rangeStartMatch, rangeZero, maxgap=1) # no match > > > -- output of sessionInfo(): > > > > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C > [5] LC_TIME=English_Australia.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicFeatures_1.12.3 AnnotationDbi_1.22.6 Biobase_2.20.1 GenomicRanges_1.12.5 IRanges_1.18.3 BiocGenerics_0.6.0 > [7] BiocInstaller_1.10.3 > > loaded via a namespace (and not attached): > [1] biomaRt_2.16.0 Biostrings_2.28.0 bitops_1.0-6 BSgenome_1.28.0 DBI_0.2-7 RCurl_1.95-4.1 Rsamtools_1.12.4 RSQLite_0.11.4 > [9] rtracklayer_1.20.4 stats4_3.0.1 tools_3.0.1 XML_3.98-1.1 zlibbioc_1.6.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > ____________________________________________________________________ ____ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc >
ADD COMMENT

Login before adding your answer.

Traffic: 468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6