Question

Can GenomicRanges findOverlaps ignore seqnames?

0

Entering edit mode

O. William McClung • 0

@o-william-mcclung-22004

Last seen 6.3 years ago

United States

findOverlaps in the GenomicRanges package has a flag, ignore.strand=TRUE, which allows the overlap computation to use only the duple (seqnames,ranges), essentially ignoring the strand. Is there a way to use findOverlaps to ignore seqnames so that only the duple (ranges,strand) is used to compute an overlap? If not, is there another way to compute overlaps using only (ranges,strand)?

Any pointers will be gratefully received.

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.36.1 GenomeInfoDb_1.20.0  IRanges_2.18.2      
[4] S4Vectors_0.22.1     BiocGenerics_0.30.0 

loaded via a namespace (and not attached):
[1] zlibbioc_1.30.0        compiler_3.6.1         XVector_0.24.0        
[4] GenomeInfoDbData_1.2.1 RCurl_1.95-4.12        bitops_1.0-6

GenomicRanges findOverlaps • 1.7k views

ADD COMMENT • link 6.3 years ago O. William McClung • 0

0

Entering edit mode

Would you please provide more details on the use case?

ADD REPLY • link 6.3 years ago Michael Lawrence ★ 11k

0

Entering edit mode

O. William McClung • 0

@o-william-mcclung-22004

Last seen 6.3 years ago

United States

@ Hervé: Many thanks. This solution clearly works.
@ Hervé and Michael: Thanks for pointing out this use case should never occur. I need to go back and rethink my pipeline.

ADD COMMENT • link 6.3 years ago O. William McClung • 0

score 2 · Accepted Answer · 2019-10-10

Just set the seqnames of all the ranges in the query and subject to the same value. This can be done with something like:

library(GenomicRanges)
example(GRanges)
GRanges("A", ranges(gr), strand(gr))
# GRanges object with 10 ranges and 0 metadata columns:
#     seqnames    ranges strand
#        <Rle> <IRanges>  <Rle>
#   a        A      1-10      -
#   b        A      2-10      +
#   c        A      3-10      +
#   d        A      4-10      *
#   e        A      5-10      *
#   f        A      6-10      +
#   g        A      7-10      +
#   h        A      8-10      +
#   i        A      9-10      -
#   j        A        10      -
#   -------
#   seqinfo: 1 sequence from an unspecified genome; no seqlengths

However, I can't think of any real-world situation where doing something like this would actually have some meaning.