Search
Question: Element-wise setdiff when psetdiff fails for GenomicRanges
1
gravatar for Kipper Fletez-Brant
2.7 years ago by
United States
Kipper Fletez-Brant140 wrote:

I have 2 GRanges objects of equal length.  For each individual pair of rows from the 2 objects, I want to get:

    a) the intersection of the ranges

    b) the ranges unique to the first range

    c) the ranges unique the second range

See my code with comments for specifics, here I will present an overview of my problem.  I have no problem with intersect() - the seems to work.  My trouble is getting the asymmetric differences between paired ranges.  I can do this one row at a time using setdiff() on same row from both elements, but the runtime explodes.  If I pass both GRanges objects, ranges that happen to be overlapping are collapsed into 1 range, which is not what I want.  Because some ranges contain their counterpart from the other GRanges object, psetdiff() crashes.  What is the fastest way to get what I want?

 

# test data

 > test.a <- GRanges(seqnames=rep("chr1", times=5), ranges=IRanges(start=c(1000, 2000, 4000, 6000, 7000), end=c(2000, 3000, 5000, 7000, 8000)))
 > test.b <- GRanges(seqnames=rep("chr1", times=5), ranges=IRanges(start=c(1000, 2100, 4500, 5900, 7700), end=c(1500, 2900, 5000, 6300, 8200)))

#  the output I am looking for

# note that the results for 1 and 2 share a base in common

# I still count these as 2 separate results

 setdiff(test.a[1], test.b[1])
 GRanges object with 1 range and 0 metadata columns:
       seqnames       ranges strand
          <Rle>    <IRanges>  <Rle>
   [1]     chr1 [1501, 2000]      *
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths

setdiff(test.a[2], test.b[2])
 GRanges object with 2 ranges and 0 metadata columns:
       seqnames       ranges strand
          <Rle>    <IRanges>  <Rle>
   [1]     chr1 [2000, 2099]      *
   [2]     chr1 [2901, 3000]      *
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths



# when i pass the full GRanges objects

# note that 1 and 2 are now combined

# this is not what i want

setdiff(test.a, test.b, resolve.empty="none")
 GRanges object with 4 ranges and 0 metadata columns:
       seqnames       ranges strand
          <Rle>    <IRanges>  <Rle>
   [1]     chr1 [1501, 2099]      *
   [2]     chr1 [2901, 3000]      *
   [3]     chr1 [4000, 4499]      *
   [4]     chr1 [6301, 7699]      *
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths



# calling psetdiff(), which i understand is supposed to be the element-wise setdiff()

# it crashes

# this is not what i want

psetdiff(test.a, test.b, resolve.empty="none")
 Error in start(value) :
   error in evaluating the argument 'x' in selecting a method for function 'start': Error in psetdiff(extractROWS(ranges(x), ok), extractROWS(ranges(y), ok)) :
   some ranges in 'y' have their end points strictly inside
   the range in 'x' that they need to be subtracted from.
   Cannot subtract them.



 

ADD COMMENTlink modified 2.7 years ago by Michael Lawrence60 • written 2.7 years ago by Kipper Fletez-Brant140
2
gravatar for Michael Lawrence
2.7 years ago by
United States
Michael Lawrence60 wrote:

The psetdif() operation is indeed what you want; however, it is what we call an "endomorphism", i.e., it returns an object of the same type as its inputs. A GRanges is obviously incapable of representing the result of a setdiff when "x" contains "y".  Thus, we need a data structure that is capable of representing that complexity. That data structure is GRangesList.

The solution is to convert your GRanges objects to GRangesList objects and then call psetdiff().

grl.a <- as(test.a, "List")
grl.b <- as(test.b, "List")
psetdiff(grl.a, grl.b)

Hope that helps.

ADD COMMENTlink written 2.7 years ago by Michael Lawrence60

Thank you for the explanation.  Perhaps I am misunderstanding you.  When I enter your code I get the error below.

grl.a <- as(test.a, "List")
Error in as(from, selectListClassName(element.type)) : 
  no method or default for coercing “GRanges” to “GRangesList”
ADD REPLYlink written 2.7 years ago by Kipper Fletez-Brant140

Works for me:

> library(GenomicRanges)
> test.a <- GRanges(seqnames=rep("chr1", times=5), ranges=IRanges(start=c(1000, 2000, 4000, 6000, 7000), end=c(2000, 3000, 5000, 7000, 8000)))
> test.b <- GRanges(seqnames=rep("chr1", times=5), ranges=IRanges(start=c(1000, 2100, 4500, 5900, 7700), end=c(1500, 2900, 5000, 6300, 8200)))
> t.a <- as(test.a, "List")
> t.b <- as(test.b, "List")
> psetdiff(t.a, t.b)
GRangesList object of length 5:
[[1]]
GRanges object with 1 range and 0 metadata columns:
      seqnames       ranges strand
         <Rle>    <IRanges>  <Rle>
  [1]     chr1 [1501, 2000]      *

[[2]]
GRanges object with 2 ranges and 0 metadata columns:
      seqnames       ranges strand
  [1]     chr1 [2000, 2099]      *
  [2]     chr1 [2901, 3000]      *

[[3]]
GRanges object with 1 range and 0 metadata columns:
      seqnames       ranges strand
  [1]     chr1 [4000, 4499]      *

...
<2 more elements>
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths

> session_info()
Session info -------------------------------------------------------------------
 setting  value                       
 version  R version 3.1.1 (2014-07-10)
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       <NA>                        

Packages -----------------------------------------------------------------------
 package       * version date       source        
 BiocGenerics    0.12.1  2014-11-17 Bioconductor  
 devtools        1.7.0   2015-01-17 CRAN (R 3.1.1)
 GenomeInfoDb    1.2.4   2015-01-05 Bioconductor  
 GenomicRanges   1.18.4  2015-01-20 Bioconductor  
 IRanges         2.0.1   2014-12-15 Bioconductor  
 rstudioapi    * 0.2     2014-12-31 CRAN (R 3.1.1)
 S4Vectors       0.4.0   2014-10-15 Bioconductor  
 XVector       * 0.6.0   2014-10-15 Bioconductor  

 

ADD REPLYlink written 2.7 years ago by James W. MacDonald44k

EDIT: This did ultimately work

 

That's weird.  I looked at your session info and saw that I did not have S4Vector loaded.  I loaded S4Vector and XVector and now I get the following:

> psetdiff(t.a, t.b)
GRangesList of length 5:
[[1]] 
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘getListElement’ for signature ‘"GRangesList"’
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] XVector_0.6.0        S4Vectors_0.4.0      BiocInstaller_1.16.1
[4] GenomicRanges_1.18.4 GenomeInfoDb_1.2.4   IRanges_2.0.1       
[7] BiocGenerics_0.12.1 

loaded via a namespace (and not attached):
[1] compiler_3.1.2 tools_3.1.2   

 

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Kipper Fletez-Brant140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 119 users visited in the last hour