How do I split GRange object (set of genomic ranges with 2 metadata column) by chromosome?
2
1
Entering edit mode
@jurat-shahidin-9488
Last seen 4.6 years ago
Chicago, IL, USA

Dear all:

When I am gonna see overlapped regions between two set of GRanges objects, and I want to split theses regions into sub regions by order of chromosome. Intuitively, take all genomic regions from each chromosome and iterate over. Any hint to do this in R ? Thanks a lot

granges split r • 8.0k views
ADD COMMENT
5
Entering edit mode
Julian Gehring ★ 1.3k
@julian-gehring-5818
Last seen 5.5 years ago

This will split a GRanges object by chromosome, which gives you a GRangesList. Then you can use lapply, sapply and friends to process the data for each chromosome individually:

## some example data
gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), IRanges(1:10, width=10:1))

## optional, if you want a genomic order of the chromosomes
gr0 = sortSeqlevels(gr0)

## split into a GRangesList
## where each element has all ranges for one chromosome
grl = split(gr0, seqnames(gr0))

## apply a function to the ranges of each chromosome
res = lapply(grl, your_function)

 

ADD COMMENT
0
Entering edit mode

Hi,

Thanks a lot for your reply. I have tried your solution but it gave me error. such as:

 split(gr0, seqnames(gr0))
GRangesList object of length 3:
Error in nchar(nms) : 
  could not find symbol "keepNA" in environment of the generic function

 

But, I have checked other related post and I figured out now. Thank you

 

ADD REPLY
0
Entering edit mode

Out of curiosity, how did you manage to solve it in the end?

ADD REPLY
0
Entering edit mode

It would be good to understand why this throws an error for you. Does this fail just with your data, or also with the example data from above? Can you paste here how the object 'gr0' looks like before trying to split it?

ADD REPLY
0
Entering edit mode

Hi,

this how it looks like when use your example:

gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), IRanges(1:10, width=10:1))
> gr0
GRanges object with 10 ranges and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]     chr2  [ 1, 10]      *
   [2]     chr2  [ 2, 10]      *
   [3]     chr2  [ 3, 10]      *
   [4]     chr2  [ 4, 10]      *
   [5]     chr1  [ 5, 10]      *
   [6]     chr1  [ 6, 10]      *
   [7]     chr3  [ 7, 10]      *
   [8]     chr3  [ 8, 10]      *
   [9]     chr3  [ 9, 10]      *
  [10]     chr3  [10, 10]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
> split(gr0, seqnames(gr0))
GRangesList object of length 3:
Error in nchar(nms) :
  could not find symbol "keepNA" in environment of the generic function
ADD REPLY
0
Entering edit mode

Okay, that seems strange, because the same is working fine for me. Can you check if you are running an outdated version of the GenomicRanges package? My sessionInfo():

R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.22.4 GenomeInfoDb_1.6.3   IRanges_2.4.6       
[4] S4Vectors_0.8.11     BiocGenerics_0.16.1 

loaded via a namespace (and not attached):
[1] zlibbioc_1.16.0 XVector_0.10.0
ADD REPLY
0
Entering edit mode

Hi,

oops, may be I have to updated the packages. I will try.

FYI,  I am gonna iterate this one by one. Do you know any easy way to do this? Instead I have to write function to handle this case. Thank you !!

ADD REPLY
0
Entering edit mode

I have updated my initial answer which now also explains how to apply a function to the entries of each chromosome.

You can also have a look at a recent discussion (looping through a GRangesList object) why calling a function for each entry in a GRanges object may be slow.

ADD REPLY
0
Entering edit mode

The split command does not work. It split the data but only took the first 4562 transcripts from each chromosome. I assume this is because the first chromosome had that many transcripts.

ADD REPLY
0
Entering edit mode
@jurat-shahidin-9488
Last seen 4.6 years ago
Chicago, IL, USA

Hi,

there was one posting that bit of related about what I asked. They did like this:

# read bed files

bed_gr <- as(import.bed(bed.1), "GRanges")

#then , select specific chromosome (i.e., chr1 )

bed_gr$chr1 <- bed_gr[seqnames(bed_gr)=="chr1"]

 

but, in my case, I am gonna iterate this one by one. Do you know any easy way to do this? Instead I have to write function to handle this case. Thanks a lot Julian !!

 

 

ADD COMMENT
0
Entering edit mode

I'd suggest going with Julian's solution.

ADD REPLY

Login before adding your answer.

Traffic: 428 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6