Dear all:
When I am gonna see overlapped regions between two set of GRanges objects, and I want to split theses regions into sub regions by order of chromosome. Intuitively, take all genomic regions from each chromosome and iterate over. Any hint to do this in R ? Thanks a lot
Hi,
Thanks a lot for your reply. I have tried your solution but it gave me error. such as:
split(gr0, seqnames(gr0))
GRangesList object of length 3:
Error in nchar(nms) :
could not find symbol "keepNA" in environment of the generic function
But, I have checked other related post and I figured out now. Thank you
Out of curiosity, how did you manage to solve it in the end?
It would be good to understand why this throws an error for you. Does this fail just with your data, or also with the example data from above? Can you paste here how the object 'gr0' looks like before trying to split it?
Hi,
this how it looks like when use your example:
gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), IRanges(1:10, width=10:1)) > gr0 GRanges object with 10 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chr2 [ 1, 10] * [2] chr2 [ 2, 10] * [3] chr2 [ 3, 10] * [4] chr2 [ 4, 10] * [5] chr1 [ 5, 10] * [6] chr1 [ 6, 10] * [7] chr3 [ 7, 10] * [8] chr3 [ 8, 10] * [9] chr3 [ 9, 10] * [10] chr3 [10, 10] * ------- seqinfo: 3 sequences from an unspecified genome; no seqlengths > split(gr0, seqnames(gr0)) GRangesList object of length 3: Error in nchar(nms) : could not find symbol "keepNA" in environment of the generic function
Okay, that seems strange, because the same is working fine for me. Can you check if you are running an outdated version of the GenomicRanges package? My sessionInfo():
Hi,
oops, may be I have to updated the packages. I will try.
FYI, I am gonna iterate this one by one. Do you know any easy way to do this? Instead I have to write function to handle this case. Thank you !!
I have updated my initial answer which now also explains how to apply a function to the entries of each chromosome.
You can also have a look at a recent discussion (looping through a GRangesList object) why calling a function for each entry in a GRanges object may be slow.
The split command does not work. It split the data but only took the first 4562 transcripts from each chromosome. I assume this is because the first chromosome had that many transcripts.