Question: GenomicRanges mapply to all combinations of GRangesList objects
0
gravatar for francesca casalino
3.6 years ago by
United States
francesca casalino50 wrote:

I have two GRange Lists and I am trying to apply countOverlaps function to each combination of the lists and return a list of results like this:

    gr1 <-  GRanges(seqnames = "chr2", ranges = IRanges(3, 6),strand = "+", score = 5L, GC = 0.45)
    gr2 <- GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(c(7,13), width = 3), strand = c("+", "-"), score = 3:4, GC = c(0.3, 0.5))
    grlA <- GRangesList("a" = gr1, "b" = gr2)

    gr1 <-  GRanges(seqnames = "chr2", ranges = IRanges(1, 10),strand = "+", score = 5L, GC = 0.45)
    gr2 <- GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(c(3,13), width = 3), strand = c("+", "-"), score = 3:4, GC = c(0.2, 0.3))
    grlB <- GRangesList("c" = gr1, "d" = gr2)

I would like to get a list of object "a" and object "b" in grlA containing the results of the function for each value of grlB:

(list $a, $b and dataframes for c,d)

$a

c d

$b

c d

I have tried this to get all the combinations of the list but I get an error...

    comb_apply <- function(f,..., MoreArgs=list()){
      exp <- unname(expand.grid(...,stringsAsFactors = FALSE))
      do.call(mapply, c(list(FUN=f, SIMPLIFY=FALSE, MoreArgs=MoreArgs), exp))
    }

    comb_apply(countOverlaps,grlA,grlB)

    Error in as.vector(x, mode = "character") : 
    no method for coercing this S4 class to a vector

What am I doing wrong?
Thank you for help/suggestions!

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by francesca casalino50
Answer: GenomicRanges mapply to all combinations of GRangesList objects
4
gravatar for Michael Lawrence
3.6 years ago by
United States
Michael Lawrence11k wrote:

We would need to add a method for expand.grid() on Vector (now added in S4Vectors 0.9.4). Until then, you can use the old index trick instead, e.g.,

comb_apply(function(i, j) countOverlaps(grlA[[i]], grlB[[j]]), seq_along(grlA), seq_along(grlB))

Note that the new expand.grid,Vector returns a DataFrame, which rightfully won't let you unname it. Coerce to list first:

comb_apply <- function(f,..., MoreArgs=list()){
      exp <- unname(as.list(expand.grid(...,stringsAsFactors = FALSE)))
      do.call(mapply, c(list(FUN=f, SIMPLIFY=FALSE, MoreArgs=MoreArgs), exp))
}

 

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Michael Lawrence11k

Thank you. It is very slow but it may have done something wrong in my original function since the output is not quite what I was looking for with lists of a containing data frames of b, anyway I will try to fix that. Thanks!

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by francesca casalino50

One way to make it faster would be to use GenomicRangesList() instead of GRangesList(). The latter has a representation that is not amenable to iteration.

ADD REPLYlink written 3.6 years ago by Michael Lawrence11k

Great, thank you. Now just trying to figure out how I can get a rbind data frame for the object B, I think I will just resort to looping. 

ADD REPLYlink written 3.6 years ago by francesca casalino50

If you clarify your problem, we might be able to help.

ADD REPLYlink written 3.6 years ago by Michael Lawrence11k
Answer: GenomicRanges mapply to all combinations of GRangesList objects
0
gravatar for francesca casalino
3.6 years ago by
United States
francesca casalino50 wrote:
Thanks very much for your help! 
SORRY I realise I gave a bad example because the columns would not be all the same in the data frames! Here is a new example:

 

    gr1 <- GRanges(seqnames = c("chr1", "chr2"), ranges = IRanges(c(7,13), width = 3), strand = c("+", "-"))
    gr2 <- GRanges(seqnames = c("chr1", "chr3"), ranges = IRanges(c(5,13), width = 3), strand = c("+", "-"))
    grlA <- GRangesList("a" = gr1, "b" = gr2)

    gr1 <- GRanges(seqnames = c("chr1", "chr2"), ranges = IRanges(c(1,13), width = 3), strand = c("+", "-"))
    gr2 <- GRanges(seqnames = c("chr1", "chr3"), ranges = IRanges(c(3,13), width = 3), strand = c("+", "-"))
    grlB <- GRangesList("c" = gr1, "d" = gr2)

 

The problem that I was finding is to get the data frames within the grlB:

  comb_apply <- function(f,..., MoreArgs=list()){
      exp <- unname(as.list(expand.grid(...,stringsAsFactors = FALSE)))
      do.call(mapply, c(list(FUN=f, SIMPLIFY=FALSE, MoreArgs=MoreArgs), exp))
   }
   t= comb_apply(function(i, j) countOverlaps(grlA[[i]], grlB[[j]]), seq_along(grlA), seq_along(grlB))
    names(t)=apply(expand.grid(names(grlA), names(grlB)), 1, paste, collapse="_")

> t

$a_c

[1] 1

$b_c

[1] 0 0

$a_d

[1] 0

$b_d

[1] 0 1

 

While I was looking for a list of data frames... I can do this with a grep command to select out the data frames that are part of grlB and save them in a separate list, but it is again veeery slow...

    new=list()

    for (i in names(grlB)) {

    df = as.data.frame(t[grep(i,names(t))])

    new[[length(new)+1]] <- df

    }

 

I did it like this at the end:

         d <- do.call(rbind, t)

         rownames(d)=as.character(lapply(strsplit(as.character(rownames(d)), "_", fixed=TRUE), "[", 2))

         d=as.data.frame(d)

         new2=split(d, rownames(d))

         lapply(new2, t)          

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by francesca casalino50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 261 users visited in the last hour