I am currently working with GenomicRanges structures stored in a large list (~10,000 entries) produced via output from a `foreach` loop. These lists all contain similar GenomicRanges structures, and I would like to generate a single GenomicRanges or data.frame structure with all of the unique entries across the many structures in the list (and any relevant information, e.g., seqnames, ranges, strand) as well as a count of how many times each appears across the many structures (perhaps in a metadata column). I assume that the best way to accomplish this would be via `ldply` from `plyr`, but am unsure of what a good way to good about this might be? Has anyone solved a similar problem before?
Let's say you have a list of GRanges objects,
grl, each with the same columns in
mcols(). The first step is to mark the list as a list of GRanges, i.e., a GRangesList, because the data are easier to compute on when their semantics are explicit. This can be done with
unlist() that to concatenate the GRanges together. If you want to then count each unique range, first find the unique ranges, then count how many times they match the original set.
gr <- unlist(List(grl)) gr_levels <- unique(gr) gr_levels$count <- countMatches(gr_levels, gr)