GenomicRanges mapply to all combinations of GRangesList objects
2
0
Entering edit mode
@francesca-casalino-4984
Last seen 21 months ago
United States

I have two GRange Lists and I am trying to apply countOverlaps function to each combination of the lists and return a list of results like this:

    gr1 <-  GRanges(seqnames = "chr2", ranges = IRanges(3, 6),strand = "+", score = 5L, GC = 0.45)
    gr2 <- GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(c(7,13), width = 3), strand = c("+", "-"), score = 3:4, GC = c(0.3, 0.5))
    grlA <- GRangesList("a" = gr1, "b" = gr2)

    gr1 <-  GRanges(seqnames = "chr2", ranges = IRanges(1, 10),strand = "+", score = 5L, GC = 0.45)
    gr2 <- GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(c(3,13), width = 3), strand = c("+", "-"), score = 3:4, GC = c(0.2, 0.3))
    grlB <- GRangesList("c" = gr1, "d" = gr2)

I would like to get a list of object "a" and object "b" in grlA containing the results of the function for each value of grlB:

(list $a, $b and dataframes for c,d)

$a

c d

$b

c d

I have tried this to get all the combinations of the list but I get an error...

    comb_apply <- function(f,..., MoreArgs=list()){
      exp <- unname(expand.grid(...,stringsAsFactors = FALSE))
      do.call(mapply, c(list(FUN=f, SIMPLIFY=FALSE, MoreArgs=MoreArgs), exp))
    }

    comb_apply(countOverlaps,grlA,grlB)

    Error in as.vector(x, mode = "character") : 
    no method for coercing this S4 class to a vector

What am I doing wrong?
Thank you for help/suggestions!

genomicranges mapply grangeslist • 1.9k views
ADD COMMENT
4
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States

We would need to add a method for expand.grid() on Vector (now added in S4Vectors 0.9.4). Until then, you can use the old index trick instead, e.g.,

comb_apply(function(i, j) countOverlaps(grlA[[i]], grlB[[j]]), seq_along(grlA), seq_along(grlB))

Note that the new expand.grid,Vector returns a DataFrame, which rightfully won't let you unname it. Coerce to list first:

comb_apply <- function(f,..., MoreArgs=list()){
      exp <- unname(as.list(expand.grid(...,stringsAsFactors = FALSE)))
      do.call(mapply, c(list(FUN=f, SIMPLIFY=FALSE, MoreArgs=MoreArgs), exp))
}

 

ADD COMMENT
0
Entering edit mode

Thank you. It is very slow but it may have done something wrong in my original function since the output is not quite what I was looking for with lists of a containing data frames of b, anyway I will try to fix that. Thanks!

ADD REPLY
0
Entering edit mode

One way to make it faster would be to use GenomicRangesList() instead of GRangesList(). The latter has a representation that is not amenable to iteration.

ADD REPLY
0
Entering edit mode

Great, thank you. Now just trying to figure out how I can get a rbind data frame for the object B, I think I will just resort to looping. 

ADD REPLY
0
Entering edit mode

If you clarify your problem, we might be able to help.

ADD REPLY
0
Entering edit mode
@francesca-casalino-4984
Last seen 21 months ago
United States
Thanks very much for your help! 
SORRY I realise I gave a bad example because the columns would not be all the same in the data frames! Here is a new example:

 

    gr1 <- GRanges(seqnames = c("chr1", "chr2"), ranges = IRanges(c(7,13), width = 3), strand = c("+", "-"))
    gr2 <- GRanges(seqnames = c("chr1", "chr3"), ranges = IRanges(c(5,13), width = 3), strand = c("+", "-"))
    grlA <- GRangesList("a" = gr1, "b" = gr2)

    gr1 <- GRanges(seqnames = c("chr1", "chr2"), ranges = IRanges(c(1,13), width = 3), strand = c("+", "-"))
    gr2 <- GRanges(seqnames = c("chr1", "chr3"), ranges = IRanges(c(3,13), width = 3), strand = c("+", "-"))
    grlB <- GRangesList("c" = gr1, "d" = gr2)

 

The problem that I was finding is to get the data frames within the grlB:

  comb_apply <- function(f,..., MoreArgs=list()){
      exp <- unname(as.list(expand.grid(...,stringsAsFactors = FALSE)))
      do.call(mapply, c(list(FUN=f, SIMPLIFY=FALSE, MoreArgs=MoreArgs), exp))
   }
   t= comb_apply(function(i, j) countOverlaps(grlA[[i]], grlB[[j]]), seq_along(grlA), seq_along(grlB))
    names(t)=apply(expand.grid(names(grlA), names(grlB)), 1, paste, collapse="_")

> t

$a_c

[1] 1

$b_c

[1] 0 0

$a_d

[1] 0

$b_d

[1] 0 1

 

While I was looking for a list of data frames... I can do this with a grep command to select out the data frames that are part of grlB and save them in a separate list, but it is again veeery slow...

    new=list()

    for (i in names(grlB)) {

    df = as.data.frame(t[grep(i,names(t))])

    new[[length(new)+1]] <- df

    }

 

I did it like this at the end:

         d <- do.call(rbind, t)

         rownames(d)=as.character(lapply(strsplit(as.character(rownames(d)), "_", fixed=TRUE), "[", 2))

         d=as.data.frame(d)

         new2=split(d, rownames(d))

         lapply(new2, t)          

ADD COMMENT

Login before adding your answer.

Traffic: 535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6