Question

Any method can coerce "list" object to GRanges [reproducible example is included]?

1

Entering edit mode

Jurat Shahidin ▴ 80

@jurat-shahidin-9488

Last seen 5.7 years ago

Chicago, IL, USA

Hi, everyone:

I have used extractROWS method for expanding intersected regions from hitlist, and it works fine. However, for 3 GRanges objects (third GRanges can be manually produced ), I have applied my R function for each of them, then wants to apply chisq.test, but I have trouble to coerce list objects (that contain intersected genomic interval) into GRanges objects, and I ran into this problem:

as(res_b, "GRanges") Error in as(res_b, "GRanges") : no method or default for coercing “list” to “GRanges”

For the feasibility of my problem, I have this reproducible example :

# set up data

a <- GRanges(
  seqnames=Rle(c("chr1", "chr2", "chr3", "chr4"), c(3, 2, 1, 2)), ranges=IRanges(seq(1, by=9, len=8), seq(7, by=9, len=8)),  rangeName=letters[seq(1:8)], score=sample(1:20, 8, replace = FALSE))
a$pvalue <- 10^(score(a)/-1) ; a$score <- NULL
b <- GRanges(
  seqnames=Rle(c("chr1", "chr2", "chr3","chr4"), c(4, 3, 1, 1)), ranges=IRanges(seq(2, by=5, len=9), seq(4, by=5, len=9)), rangeName=letters[seq(1:9)], score=sample(1:20, 9, replace = FALSE))
b$pvalue <- 10^(score(b)/-1) ; b$score <- NULL

# overlap

ov_ab <- as(findOverlaps(a, b), "List")

# purify

res_b <- lapply(ov_ab, function(x) {
  idx0 <- which.min(extractROWS(b$pvalue, x))
  idx1 <- extractROWS(seq_along(b), x)[idx0]
  b[idx1]
})

Objective: how can I coerce res_b to GRanges objects ? Is there any way to get out this issue ? Any recommendation, possible solution is highly appreciated. Thanks a lot.

All the best:

Jurat

granges error handling R • 3.7k views

ADD COMMENT • link updated 9.7 years ago by Michael Lawrence ★ 11k • written 9.7 years ago by Jurat Shahidin ▴ 80

2

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 4.1 years ago

United States

It would help to understand your high-level question. I think you want to find the subjects with the minimum p-value, grouped by query range. Since a subject might overlap multiple queries, we need to use extractList() to get p-values grouped by query, and then use which.min() and compute the global index:

min_b <- which.min(extractList(b$pvalue, ov_ab))
min_b_global <-  min_b + start(PartitioningByEnd(ov_ab)) - 1L
res_b <- b[min_b_global]

Perhaps we need a global() that converts the local index into unlisted space (the second line above).

ADD COMMENT • link 9.7 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Dear Michael:

I am very thankful from you this quick respond. Yes, I had discussed this issue with Martin Morgan C: Find overlap for more than 2 GRanges object simultaneously (reproducible example, and it is quite efficient as well. But, in our algorithm, we have very important step(just keep most significant regions if multiple intersection occurs) that may effect overall result, so I tried to use extractROWS, and it works very nice in my case. However, I ran into problem when I tried to apply chisq.test on them, and I realized that I have to coerce them in GRanges objects. I could try your solution. Thanks again for your help

Best regards:

Jurat

ADD REPLY • link 9.7 years ago Jurat Shahidin ▴ 80

score 4 · Accepted Answer · 2016-03-31

4

Entering edit mode

Mike Smith ★ 6.6k

@mike-smith

Last seen 16 days ago

EMBL Heidelberg

A slightly round-about way of doing it, but how about converting your list to a GRangesList, then to a data.frame, and then to single GRanges object e.g.

singleGRange <- GRanges(as.data.frame(GRangesList(res_b)))

> singleGRange
GRanges object with 4 ranges and 4 metadata columns:
      seqnames    ranges strand |     group  group_name   rangeName    pvalue
         <Rle> <IRanges>  <Rle> | <integer> <character> <character> <numeric>
  [1]     chr1  [ 7,  9]      * |         1        <NA>           b     1e-15
  [2]     chr1  [12, 14]      * |         2        <NA>           c     1e-12
  [3]     chr1  [17, 19]      * |         3        <NA>           d     1e-20
  [4]     chr2  [32, 34]      * |         4        <NA>           g     1e-14
  -------
  seqinfo: 4 sequences from an unspecified genome; no seqlengths

ADD COMMENT • link 9.7 years ago Mike Smith ★ 6.6k

1

Entering edit mode

I have to admit I didn't know that was possible, but you could just unlist:

unlist(GRangesList(res_b))

ADD REPLY • link 9.7 years ago Michael Lawrence ★ 11k

1

Entering edit mode

Ah, that's much nicer. I was sure there was a way to flatten the GRangeList, but looked under 'Coercion' rather than 'Combining' in the man page.

ADD REPLY • link 9.7 years ago Mike Smith ★ 6.6k

0

Entering edit mode

Hi Mike:

Thanks a lot for this quick respond. I have tried your solution, that could help me out.

All the best:

Jurat

ADD REPLY • link 9.7 years ago Jurat Shahidin ▴ 80