Any method can coerce "list" object to GRanges [reproducible example is included]?
2
1
Entering edit mode
@jurat-shahidin-9488
Last seen 4.7 years ago
Chicago, IL, USA

Hi, everyone:

I have used extractROWS method for expanding intersected regions from hitlist, and it works fine. However, for 3 GRanges objects (third GRanges can be manually produced ), I have applied my R function for each of them, then wants to apply chisq.test, but I have trouble to coerce list objects (that contain intersected genomic interval) into GRanges objects, and I ran into this problem: 

 as(res_b, "GRanges")
Error in as(res_b, "GRanges") : 
  no method or default for coercing “list” to “GRanges”

For the feasibility of my problem, I have this reproducible example : 

# set up data

​a <- GRanges(
  seqnames=Rle(c("chr1", "chr2", "chr3", "chr4"), c(3, 2, 1, 2)), ranges=IRanges(seq(1, by=9, len=8), seq(7, by=9, len=8)),  rangeName=letters[seq(1:8)], score=sample(1:20, 8, replace = FALSE))
a$pvalue <- 10^(score(a)/-1) ; a$score <- NULL
b <- GRanges(
  seqnames=Rle(c("chr1", "chr2", "chr3","chr4"), c(4, 3, 1, 1)), ranges=IRanges(seq(2, by=5, len=9), seq(4, by=5, len=9)), rangeName=letters[seq(1:9)], score=sample(1:20, 9, replace = FALSE))
b$pvalue <- 10^(score(b)/-1) ; b$score <- NULL

# overlap

ov_ab <- as(findOverlaps(a, b), "List")

# purify

res_b <- lapply(ov_ab, function(x) {
  idx0 <- which.min(extractROWS(b$pvalue, x))
  idx1 <- extractROWS(seq_along(b), x)[idx0]
  b[idx1]
})

Objective: how can I coerce res_b to GRanges objects ? Is there any way to get out this issue ? Any recommendation, possible solution is highly appreciated. Thanks a lot.

All the best:

Jurat

 

 

granges error handling R • 3.2k views
ADD COMMENT
4
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 3 hours ago
EMBL Heidelberg

A slightly round-about way of doing it, but how about converting your list to a GRangesList, then to a data.frame, and then to single GRanges object e.g.

singleGRange <- GRanges(as.data.frame(GRangesList(res_b)))
> singleGRange
GRanges object with 4 ranges and 4 metadata columns:
      seqnames    ranges strand |     group  group_name   rangeName    pvalue
         <Rle> <IRanges>  <Rle> | <integer> <character> <character> <numeric>
  [1]     chr1  [ 7,  9]      * |         1        <NA>           b     1e-15
  [2]     chr1  [12, 14]      * |         2        <NA>           c     1e-12
  [3]     chr1  [17, 19]      * |         3        <NA>           d     1e-20
  [4]     chr2  [32, 34]      * |         4        <NA>           g     1e-14
  -------
  seqinfo: 4 sequences from an unspecified genome; no seqlengths

 

ADD COMMENT
1
Entering edit mode

I have to admit I didn't know that was possible, but you could just unlist:

unlist(GRangesList(res_b))
ADD REPLY
1
Entering edit mode

Ah, that's much nicer.  I was sure there was a way to flatten the GRangeList, but looked under 'Coercion' rather than 'Combining' in the man page.

ADD REPLY
0
Entering edit mode

Hi Mike:

Thanks a lot for this quick respond. I have tried your solution, that could help me out. 

All the best:

Jurat

ADD REPLY
2
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

It would help to understand your high-level question. I think you want to find the subjects with the minimum p-value, grouped by query range. Since a subject might overlap multiple queries, we need to use extractList() to get p-values grouped by query, and then use which.min() and compute the global index:

min_b <- which.min(extractList(b$pvalue, ov_ab))
min_b_global <-  min_b + start(PartitioningByEnd(ov_ab)) - 1L
res_b <- b[min_b_global]

Perhaps we need a global() that converts the local index into unlisted space (the second line above).

 

ADD COMMENT
0
Entering edit mode

Dear Michael:

I am very thankful from you this quick respond. Yes, I had discussed this issue with Martin Morgan  C: Find overlap for more than 2 GRanges object simultaneously (reproducible example, and it is quite efficient as well. But, in our algorithm, we have very important step(just keep most significant regions if multiple intersection occurs) that may effect overall result, so I tried to use extractROWS, and it works very nice in my case. However, I ran into problem when I tried to apply chisq.test on them, and I realized that I have to coerce them in GRanges objects. I could try your solution. Thanks again for your help

Best regards:

Jurat

ADD REPLY

Login before adding your answer.

Traffic: 898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6