Question: How do I split GRange object (set of genomic ranges with 2 metadata column) by chromosome?
1
gravatar for Jurat Shahidin
3.8 years ago by
Chicago, IL, USA
Jurat Shahidin70 wrote:

Dear all:

When I am gonna see overlapped regions between two set of GRanges objects, and I want to split theses regions into sub regions by order of chromosome. Intuitively, take all genomic regions from each chromosome and iterate over. Any hint to do this in R ? Thanks a lot

granges R split • 3.3k views
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Jurat Shahidin70
Answer: How do I split GRange object (set of genomic ranges with 2 metadata column) by c
4
gravatar for Julian Gehring
3.8 years ago by
Julian Gehring1.3k
Julian Gehring1.3k wrote:

This will split a GRanges object by chromosome, which gives you a GRangesList. Then you can use lapply, sapply and friends to process the data for each chromosome individually:

## some example data
gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), IRanges(1:10, width=10:1))

## optional, if you want a genomic order of the chromosomes
gr0 = sortSeqlevels(gr0)

## split into a GRangesList
## where each element has all ranges for one chromosome
grl = split(gr0, seqnames(gr0))

## apply a function to the ranges of each chromosome
res = lapply(grl, your_function)

 

ADD COMMENTlink modified 3.8 years ago by Martin Morgan ♦♦ 24k • written 3.8 years ago by Julian Gehring1.3k

Hi,

Thanks a lot for your reply. I have tried your solution but it gave me error. such as:

 split(gr0, seqnames(gr0))
GRangesList object of length 3:
Error in nchar(nms) : 
  could not find symbol "keepNA" in environment of the generic function

 

But, I have checked other related post and I figured out now. Thank you

 

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by Jurat Shahidin70

Out of curiosity, how did you manage to solve it in the end?

ADD REPLYlink written 3.8 years ago by Julian Gehring1.3k

It would be good to understand why this throws an error for you. Does this fail just with your data, or also with the example data from above? Can you paste here how the object 'gr0' looks like before trying to split it?

ADD REPLYlink written 3.8 years ago by Julian Gehring1.3k

Hi,

this how it looks like when use your example:

gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), IRanges(1:10, width=10:1))
> gr0
GRanges object with 10 ranges and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]     chr2  [ 1, 10]      *
   [2]     chr2  [ 2, 10]      *
   [3]     chr2  [ 3, 10]      *
   [4]     chr2  [ 4, 10]      *
   [5]     chr1  [ 5, 10]      *
   [6]     chr1  [ 6, 10]      *
   [7]     chr3  [ 7, 10]      *
   [8]     chr3  [ 8, 10]      *
   [9]     chr3  [ 9, 10]      *
  [10]     chr3  [10, 10]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
> split(gr0, seqnames(gr0))
GRangesList object of length 3:
Error in nchar(nms) :
  could not find symbol "keepNA" in environment of the generic function
ADD REPLYlink written 3.8 years ago by Jurat Shahidin70

Okay, that seems strange, because the same is working fine for me. Can you check if you are running an outdated version of the GenomicRanges package? My sessionInfo():

R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.22.4 GenomeInfoDb_1.6.3   IRanges_2.4.6       
[4] S4Vectors_0.8.11     BiocGenerics_0.16.1 

loaded via a namespace (and not attached):
[1] zlibbioc_1.16.0 XVector_0.10.0
ADD REPLYlink written 3.8 years ago by Julian Gehring1.3k

Hi,

oops, may be I have to updated the packages. I will try.

FYI,  I am gonna iterate this one by one. Do you know any easy way to do this? Instead I have to write function to handle this case. Thank you !!

ADD REPLYlink written 3.8 years ago by Jurat Shahidin70

I have updated my initial answer which now also explains how to apply a function to the entries of each chromosome.

You can also have a look at a recent discussion (looping through a GRangesList object) why calling a function for each entry in a GRanges object may be slow.

ADD REPLYlink written 3.8 years ago by Julian Gehring1.3k
Answer: How do I split GRange object (set of genomic ranges with 2 metadata column) by c
0
gravatar for Jurat Shahidin
3.8 years ago by
Chicago, IL, USA
Jurat Shahidin70 wrote:

Hi,

there was one posting that bit of related about what I asked. They did like this:

# read bed files

bed_gr <- as(import.bed(bed.1), "GRanges")

#then , select specific chromosome (i.e., chr1 )

bed_gr$chr1 <- bed_gr[seqnames(bed_gr)=="chr1"]

 

but, in my case, I am gonna iterate this one by one. Do you know any easy way to do this? Instead I have to write function to handle this case. Thanks a lot Julian !!

 

 

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Jurat Shahidin70

I'd suggest going with Julian's solution.

ADD REPLYlink written 3.8 years ago by Sean Davis21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 225 users visited in the last hour