Search
Question: Splitting a GRanges object into List of GRange objects by position.
0
gravatar for rjactonspsfcf
9 months ago by
rjactonspsfcf0 wrote:

I have a GRanges object and I would like to get a list of GRange objects which are subsets of the original object, each of the the objects in this list should have the same position (same chr, start and end values).

 

library(GenomicRanges)
# input:
gr=GRanges(seqnames=c("chr1","chr1","chr2","chr2"),
           ranges=IRanges(start=c(50,50,200,200),end=c(100,100,300,300)),
           strand=c("+","+","-","-"),
           scores=c(0.5,0.5,0.5,0.5)
)
gr
>GRanges object with 4 ranges and 1 metadata column:
>      seqnames     ranges strand |    scores
>              | 
>  [1]     chr1 [ 50, 100]      + |       0.5
>  [2]     chr1 [ 50, 100]      + |       0.5
>  [3]     chr2 [200, 300]      - |       0.5
>  [4]     chr2 [200, 300]      - |       0.5
>  -------
>  seqinfo: 2 sequences from an unspecified genome; no seqlengths

# the desired output is an object of the form of ls
gr1=GRanges(seqnames=c("chr1","chr1"),
           ranges=IRanges(start=c(50,50),end=c(100,100)),
           strand=c("+","+")
)
gr1
>GRanges object with 2 ranges and 0 metadata columns:
>      seqnames    ranges strand
>            
>  [1]     chr1 [50, 100]      +
>  [2]     chr1 [50, 100]      +
>  -------
>  seqinfo: 1 sequence from an unspecified genome; no seqlengths

gr2=GRanges(seqnames=c("chr2","chr2"),
           ranges=IRanges(start=c(200,200),end=c(300,300)),
           strand=c("-","-"),
           scores=c(0.5,0.5)
)
gr2
>GRanges object with 2 ranges and 1 metadata column:
>      seqnames     ranges strand |    scores
>              | 
>  [1]     chr2 [200, 300]      - |       0.5
>  [2]     chr2 [200, 300]      - |       0.5
>  -------
>  seqinfo: 1 sequence from an unspecified genome; no seqlengths

ls <- list(gr1,gr2)
ls
>[[1]]
>GRanges object with 2 ranges and 0 metadata columns:
>      seqnames    ranges strand
>            
>  [1]     chr1 [50, 100]      +
>  [2]     chr1 [50, 100]      +
>  -------
>  seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
>[[2]]
>GRanges object with 2 ranges and 1 metadata column:
>      seqnames     ranges strand |    scores
>              | 
>  [1]     chr2 [200, 300]      - |       0.5
>  [2]     chr2 [200, 300]      - |       0.5
>  -------
>  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

For example in the excerpt from the GRanges object above the first 2 lines should be in a new GRanges object in a list of GRange objects and the last 2 lines in a second GRanges object in that list.

Any suggestions?

ADD COMMENTlink modified 9 months ago by Mike Smith2.1k • written 9 months ago by rjactonspsfcf0
2
gravatar for Mike Smith
9 months ago by
Mike Smith2.1k
EMBL Heidelberg / de.NBI
Mike Smith2.1k wrote:

How about this:

split(gr, as.factor(gr))
>
GRangesList object of length 2:
$chr1:50-100:+ 
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |    scores
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr1 [50, 100]      + |       0.5
  [2]     chr1 [50, 100]      + |       0.5

$chr2:200-300:- 
GRanges object with 2 ranges and 1 metadata column:
      seqnames     ranges strand | scores
  [1]     chr2 [200, 300]      - |    0.5
  [2]     chr2 [200, 300]      - |    0.5

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths

You can even do it without the as.factor() part and it seems to work, but I get a warning so maybe it's safer to use the first version.

ADD COMMENTlink written 9 months ago by Mike Smith2.1k

Yea I noticed the warning and I'm working on it. In theory, we could convert GRanges to a grouping more efficiently than the character coercion, using the 4-way integer hash.

ADD REPLYlink written 9 months ago by Michael Lawrence9.8k

Thanks Mike,

That seems to do exactly what I wanted.

I'm not sure I understand how though.

looking at the output of `as.factor(gr)` I can see why it works with as.factor but not why it would work without?

I had not expected that behaviour from as.factor as applied to a GRanges object, thought the presence of metadata might cause issues but it seems to be working on the IRanges bit, which makes some sense in retrospect. This is a very useful behaviour to be aware of.

ADD REPLYlink written 9 months ago by rjactonspsfcf0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 213 users visited in the last hour