Any way to detach or simplify nested GRangesList ?
0
1
Entering edit mode
@jurat-shahidin-9488
Last seen 16 months ago
Chicago, IL, USA

Hi :

I have set of genomic interval in GRanges objects, and I did filter out each GRanges object into two distinctive set based on score column. I expect each GRanges objects must have Confirm, Discard set after filtration. My approach works for me, turns out its output format is bit of undesired and need to do simplification. I bet there must be better way to achieve efficient output format for filtering on big GRanges objects. Can anyone point me how to solve this issue easily? Thanks a lot !

Note: toy data only explain how my real data looks like, so it is simulated based on structure of my dataset.

# toy data

grs <- GRangesList(
foo = GRanges( seqnames=Rle("chr1", 3),ranges=IRanges(c(2,7,16), c(5,14,20)),
rangeName=c("a1", "a2", "a3"), score=c(4, 6,9)),
bar = GRanges(seqnames=Rle("chr1", 3),ranges=IRanges(c(4,13,26), c(11,17,28)),
rangeName=c("b1", "b2", "b3"), score=c(11, 7, 8)),
bleh = GRanges(seqnames=Rle("chr1", 4),ranges=IRanges(c(1,4,10, 23), c(3,8,14, 29)),
rangeName=c("c1", "c2", "c3", "c4"), score= c(4, 6, 3, 8))
)

so I come up this, turns out it is bit of difficult form, and I am stuck with its simplification :

res <- lapply(grs, function(x) split(x, c("Confirm", "Discard")[(x$score > 6)+1])) I want to simplify because of this reason: > res[1] I want to compare res[[1]]$Confirm with res[[1]]$Discard, for example, assume that one regions both existed in Confirm,and Discard set, then I am gonna remove this instances. I think it's better to get out nested list first, detach nested list as individual list and access its subset respectively How can perform this simplification? Does anyone knows any trick of doing this manipulation? Best regards: Jurat grangeslist filtering r • 632 views ADD COMMENT 1 Entering edit mode Does calling unlist within lapply() return what you want? res <- lapply(grs, function(x) unlist(split(x, c("Confirm", "Discard")[(x$score > 6)+1])))

> res
$foo GRanges object with 3 ranges and 2 metadata columns: seqnames ranges strand | rangeName score | Confirm chr1 [ 2, 5] * | a1 4 Confirm chr1 [ 7, 14] * | a2 6 Discard chr1 [16, 20] * | a3 9 ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths$bar
GRanges object with 3 ranges and 2 metadata columns:
seqnames    ranges strand |   rangeName     score
|
Discard     chr1  [ 4, 11]      * |          b1        11
Discard     chr1  [13, 17]      * |          b2         7
Discard     chr1  [26, 28]      * |          b3         8
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths

$bleh GRanges object with 4 ranges and 2 metadata columns: seqnames ranges strand | rangeName score | Confirm chr1 [ 1, 3] * | c1 4 Confirm chr1 [ 4, 8] * | c2 6 Confirm chr1 [10, 14] * | c3 3 Discard chr1 [23, 29] * | c4 8 ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths  ADD REPLY 0 Entering edit mode Hi Diego: Thanks your respond on my post. Yes, your answer is good move and quite close to my expected output format. I think it would be more clean to add Confirm/Discard as new metadata column on next to score column. How can I make this happen? What if I try to extract out Confirmed regions, and Discarded Regions in each GRanges and compare each other, How can I get done this? Thanks your kind help ! Best regards: Jurat ADD REPLY 1 Entering edit mode Below is an attempt to do what I understood you want. For some reason I can't reassign a mcols object in GRanges with <-, making the code more complicated than I was hoping. I use cut to classify scores based on whether they are between 0 and 6 or larger. You can tune those values to your liking. This is highly likely not the best way to do it. Hopefully someone else will point out a better option: lapply(grs, function(x) { clas <- cut(score(x), c(0, 6, Inf), labels = c("Discard", "Confirm")) mc <- DataFrame(mcols(x), decision = clas) GRanges( ranges = ranges(x), strand = strand(x), seqnames = seqnames(x), mc ) })$foo
GRanges object with 3 ranges and 3 metadata columns:
seqnames    ranges strand |   rangeName     score decision
|
[1]     chr1  [ 2,  5]      * |          a1         4  Discard
[2]     chr1  [ 7, 14]      * |          a2         6  Discard
[3]     chr1  [16, 20]      * |          a3         9  Confirm
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths

$bar GRanges object with 3 ranges and 3 metadata columns: seqnames ranges strand | rangeName score decision | [1] chr1 [ 4, 11] * | b1 11 Confirm [2] chr1 [13, 17] * | b2 7 Confirm [3] chr1 [26, 28] * | b3 8 Confirm ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths$bleh
GRanges object with 4 ranges and 3 metadata columns:
seqnames    ranges strand |   rangeName     score decision
|
[1]     chr1  [ 1,  3]      * |          c1         4  Discard
[2]     chr1  [ 4,  8]      * |          c2         6  Discard
[3]     chr1  [10, 14]      * |          c3         3  Discard
[4]     chr1  [23, 29]      * |          c4         8  Confirm
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths


0
Entering edit mode

Daer Diego :

Thanks again for your effort on this question. So far, your approach at least give me hope to get over this question. Thank you so much !

Jurat