Question: How do I split GRange object (set of genomic ranges with 2 metadata column) by chromosome?
1
3.8 years ago by
Chicago, IL, USA
Jurat Shahidin70 wrote:

Dear all:

When I am gonna see overlapped regions between two set of GRanges objects, and I want to split theses regions into sub regions by order of chromosome. Intuitively, take all genomic regions from each chromosome and iterate over. Any hint to do this in R ? Thanks a lot

granges R split • 3.3k views
modified 3.8 years ago • written 3.8 years ago by Jurat Shahidin70
Answer: How do I split GRange object (set of genomic ranges with 2 metadata column) by c
4
3.8 years ago by
Julian Gehring1.3k
Julian Gehring1.3k wrote:

This will split a GRanges object by chromosome, which gives you a GRangesList. Then you can use lapply, sapply and friends to process the data for each chromosome individually:

## some example data
gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), IRanges(1:10, width=10:1))

## optional, if you want a genomic order of the chromosomes
gr0 = sortSeqlevels(gr0)

## split into a GRangesList
## where each element has all ranges for one chromosome
grl = split(gr0, seqnames(gr0))

## apply a function to the ranges of each chromosome
res = lapply(grl, your_function)

Hi,

Thanks a lot for your reply. I have tried your solution but it gave me error. such as:

 split(gr0, seqnames(gr0)) GRangesList object of length 3: Error in nchar(nms) :    could not find symbol "keepNA" in environment of the generic function

But, I have checked other related post and I figured out now. Thank you

Out of curiosity, how did you manage to solve it in the end?

It would be good to understand why this throws an error for you. Does this fail just with your data, or also with the example data from above? Can you paste here how the object 'gr0' looks like before trying to split it?

Hi,

this how it looks like when use your example:

gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), IRanges(1:10, width=10:1))
> gr0
GRanges object with 10 ranges and 0 metadata columns:
seqnames    ranges strand
<Rle> <IRanges>  <Rle>
[1]     chr2  [ 1, 10]      *
[2]     chr2  [ 2, 10]      *
[3]     chr2  [ 3, 10]      *
[4]     chr2  [ 4, 10]      *
[5]     chr1  [ 5, 10]      *
[6]     chr1  [ 6, 10]      *
[7]     chr3  [ 7, 10]      *
[8]     chr3  [ 8, 10]      *
[9]     chr3  [ 9, 10]      *
[10]     chr3  [10, 10]      *
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
> split(gr0, seqnames(gr0))
GRangesList object of length 3:
Error in nchar(nms) :
could not find symbol "keepNA" in environment of the generic function

Okay, that seems strange, because the same is working fine for me. Can you check if you are running an outdated version of the GenomicRanges package? My sessionInfo():

R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
[1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=de_DE.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] GenomicRanges_1.22.4 GenomeInfoDb_1.6.3   IRanges_2.4.6
[4] S4Vectors_0.8.11     BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.16.0 XVector_0.10.0

Hi,

oops, may be I have to updated the packages. I will try.

FYI,  I am gonna iterate this one by one. Do you know any easy way to do this? Instead I have to write function to handle this case. Thank you !!

I have updated my initial answer which now also explains how to apply a function to the entries of each chromosome.

You can also have a look at a recent discussion (looping through a GRangesList object) why calling a function for each entry in a GRanges object may be slow.

Answer: How do I split GRange object (set of genomic ranges with 2 metadata column) by c
0
3.8 years ago by
Chicago, IL, USA
Jurat Shahidin70 wrote:

Hi,

there was one posting that bit of related about what I asked. They did like this:

bed_gr <- as(import.bed(bed.1), "GRanges")

#then , select specific chromosome (i.e., chr1 )

bed_gr\$chr1 <- bed_gr[seqnames(bed_gr)=="chr1"]

but, in my case, I am gonna iterate this one by one. Do you know any easy way to do this? Instead I have to write function to handle this case. Thanks a lot Julian !!