Question: Splitting a GRanges object into List of GRange objects by position.
0
2.6 years ago by
RichardJActon30 wrote:

I have a GRanges object and I would like to get a list of GRange objects which are subsets of the original object, each of the the objects in this list should have the same position (same chr, start and end values).

library(GenomicRanges)
# input:
gr=GRanges(seqnames=c("chr1","chr1","chr2","chr2"),
ranges=IRanges(start=c(50,50,200,200),end=c(100,100,300,300)),
strand=c("+","+","-","-"),
scores=c(0.5,0.5,0.5,0.5)
)
gr
>GRanges object with 4 ranges and 1 metadata column:
>      seqnames     ranges strand |    scores
>              |
>  [1]     chr1 [ 50, 100]      + |       0.5
>  [2]     chr1 [ 50, 100]      + |       0.5
>  [3]     chr2 [200, 300]      - |       0.5
>  [4]     chr2 [200, 300]      - |       0.5
>  -------
>  seqinfo: 2 sequences from an unspecified genome; no seqlengths

# the desired output is an object of the form of ls
gr1=GRanges(seqnames=c("chr1","chr1"),
ranges=IRanges(start=c(50,50),end=c(100,100)),
strand=c("+","+")
)
gr1
>GRanges object with 2 ranges and 0 metadata columns:
>      seqnames    ranges strand
>
>  [1]     chr1 [50, 100]      +
>  [2]     chr1 [50, 100]      +
>  -------
>  seqinfo: 1 sequence from an unspecified genome; no seqlengths

gr2=GRanges(seqnames=c("chr2","chr2"),
ranges=IRanges(start=c(200,200),end=c(300,300)),
strand=c("-","-"),
scores=c(0.5,0.5)
)
gr2
>GRanges object with 2 ranges and 1 metadata column:
>      seqnames     ranges strand |    scores
>              |
>  [1]     chr2 [200, 300]      - |       0.5
>  [2]     chr2 [200, 300]      - |       0.5
>  -------
>  seqinfo: 1 sequence from an unspecified genome; no seqlengths

ls <- list(gr1,gr2)
ls
>[[1]]
>GRanges object with 2 ranges and 0 metadata columns:
>      seqnames    ranges strand
>
>  [1]     chr1 [50, 100]      +
>  [2]     chr1 [50, 100]      +
>  -------
>  seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
>[[2]]
>GRanges object with 2 ranges and 1 metadata column:
>      seqnames     ranges strand |    scores
>              |
>  [1]     chr2 [200, 300]      - |       0.5
>  [2]     chr2 [200, 300]      - |       0.5
>  -------
>  seqinfo: 1 sequence from an unspecified genome; no seqlengths


For example in the excerpt from the GRanges object above the first 2 lines should be in a new GRanges object in a list of GRange objects and the last 2 lines in a second GRanges object in that list.

Any suggestions?

granges grangeslist R group subsett • 1.8k views
modified 2.6 years ago by Mike Smith3.9k • written 2.6 years ago by RichardJActon30
Answer: Splitting a GRanges object into List of GRange objects by position.
2
2.6 years ago by
Mike Smith3.9k
EMBL Heidelberg / de.NBI
Mike Smith3.9k wrote:

split(gr, as.factor(gr))
>
GRangesList object of length 2:
$chr1:50-100:+ GRanges object with 2 ranges and 1 metadata column: seqnames ranges strand | scores <Rle> <IRanges> <Rle> | <numeric> [1] chr1 [50, 100] + | 0.5 [2] chr1 [50, 100] + | 0.5$chr2:200-300:-
GRanges object with 2 ranges and 1 metadata column:
seqnames     ranges strand | scores
[1]     chr2 [200, 300]      - |    0.5
[2]     chr2 [200, 300]      - |    0.5

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths

You can even do it without the as.factor() part and it seems to work, but I get a warning so maybe it's safer to use the first version.

Yea I noticed the warning and I'm working on it. In theory, we could convert GRanges to a grouping more efficiently than the character coercion, using the 4-way integer hash.

Thanks Mike,

That seems to do exactly what I wanted.

I'm not sure I understand how though.

looking at the output of as.factor(gr) I can see why it works with as.factor but not why it would work without?

I had not expected that behaviour from as.factor as applied to a GRanges object, thought the presence of metadata might cause issues but it seems to be working on the IRanges bit, which makes some sense in retrospect. This is a very useful behaviour to be aware of.