Any way to detach or simplify nested GRangesList ?
0
1
Entering edit mode
@jurat-shahidin-9488
Last seen 4.7 years ago
Chicago, IL, USA

Hi :

I have set of genomic interval in GRanges objects, and I did filter out each GRanges object into two distinctive set based on score column. I expect each GRanges objects must have Confirm, Discard set after filtration. My approach works for me, turns out its output format is bit of undesired and need to do simplification. I bet there must be better way to achieve efficient output format for filtering on big GRanges objects. Can anyone point me how to solve this issue easily? Thanks a lot !

Note: toy data only explain how my real data looks like, so it is simulated based on structure of my dataset. 

# toy data

grs <- GRangesList(
  foo = GRanges( seqnames=Rle("chr1", 3),ranges=IRanges(c(2,7,16), c(5,14,20)),
                 rangeName=c("a1", "a2", "a3"), score=c(4, 6,9)),
  bar = GRanges(seqnames=Rle("chr1", 3),ranges=IRanges(c(4,13,26), c(11,17,28)),
                rangeName=c("b1", "b2", "b3"), score=c(11, 7, 8)),
  bleh = GRanges(seqnames=Rle("chr1", 4),ranges=IRanges(c(1,4,10, 23), c(3,8,14, 29)),
                 rangeName=c("c1", "c2", "c3", "c4"), score= c(4, 6, 3, 8))
)

so I come up this, turns out it is bit of difficult form, and I am stuck with its simplification :

res <- lapply(grs, function(x) split(x, c("Confirm", "Discard")[(x$score > 6)+1]))

I want to simplify because of this reason: 

> res[1]

I want to compare res[[1]]$Confirm with res[[1]]$Discard, for example, assume that one regions both existed in Confirm,and Discard set, then I am gonna remove this instances. I think it's better to get out nested list first, detach nested list as individual list and access its subset respectively

How can perform this simplification? Does anyone knows any trick of doing this manipulation? 

Best regards:

Jurat

 

 

 

grangeslist filtering r • 1.5k views
ADD COMMENT
1
Entering edit mode

Does calling unlist within lapply() return what you want?

res <- lapply(grs, function(x) unlist(split(x, c("Confirm", "Discard")[(x$score > 6)+1])))

> res
$foo
GRanges object with 3 ranges and 2 metadata columns:
          seqnames    ranges strand |   rangeName     score
                 |  
  Confirm     chr1  [ 2,  5]      * |          a1         4
  Confirm     chr1  [ 7, 14]      * |          a2         6
  Discard     chr1  [16, 20]      * |          a3         9
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

$bar
GRanges object with 3 ranges and 2 metadata columns:
          seqnames    ranges strand |   rangeName     score
                 |  
  Discard     chr1  [ 4, 11]      * |          b1        11
  Discard     chr1  [13, 17]      * |          b2         7
  Discard     chr1  [26, 28]      * |          b3         8
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

$bleh
GRanges object with 4 ranges and 2 metadata columns:
          seqnames    ranges strand |   rangeName     score
                 |  
  Confirm     chr1  [ 1,  3]      * |          c1         4
  Confirm     chr1  [ 4,  8]      * |          c2         6
  Confirm     chr1  [10, 14]      * |          c3         3
  Discard     chr1  [23, 29]      * |          c4         8
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

ADD REPLY
0
Entering edit mode

Hi Diego:

Thanks your respond on my post. Yes, your answer is good move and quite close to my expected output format. I think it would be more clean to add Confirm/Discard as new metadata column on next to score column. How can I make this happen? What if I try to extract out Confirmed regions, and Discarded Regions in each GRanges and compare each other, How can I get done this? Thanks your kind help !

Best regards:

Jurat

ADD REPLY
1
Entering edit mode

Below is an attempt to do what I understood you want. For some reason I can't reassign a `mcols` object in `GRanges` with `<-`, making the code more complicated than I was hoping. I use `cut` to classify scores based on whether they are between 0 and 6 or larger. You can tune those values to your liking. This is highly likely not the best way to do it. Hopefully someone else will point out a better option:

lapply(grs, function(x) {
  clas <- cut(score(x), c(0, 6, Inf), labels = c("Discard", "Confirm"))
  mc <- DataFrame(mcols(x), decision = clas)
  GRanges(
    ranges = ranges(x),
    strand = strand(x),
    seqnames = seqnames(x),
    mc
  )
})

$foo
GRanges object with 3 ranges and 3 metadata columns:
      seqnames    ranges strand |   rangeName     score decision
             |   
  [1]     chr1  [ 2,  5]      * |          a1         4  Discard
  [2]     chr1  [ 7, 14]      * |          a2         6  Discard
  [3]     chr1  [16, 20]      * |          a3         9  Confirm
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

$bar
GRanges object with 3 ranges and 3 metadata columns:
      seqnames    ranges strand |   rangeName     score decision
             |   
  [1]     chr1  [ 4, 11]      * |          b1        11  Confirm
  [2]     chr1  [13, 17]      * |          b2         7  Confirm
  [3]     chr1  [26, 28]      * |          b3         8  Confirm
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

$bleh
GRanges object with 4 ranges and 3 metadata columns:
      seqnames    ranges strand |   rangeName     score decision
             |   
  [1]     chr1  [ 1,  3]      * |          c1         4  Discard
  [2]     chr1  [ 4,  8]      * |          c2         6  Discard
  [3]     chr1  [10, 14]      * |          c3         3  Discard
  [4]     chr1  [23, 29]      * |          c4         8  Confirm
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

ADD REPLY
0
Entering edit mode

Daer Diego :

Thanks again for your effort on this question. So far, your approach at least give me hope to get over this question. Thank you so much !

Jurat

ADD REPLY

Login before adding your answer.

Traffic: 445 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6