Search
Question: Operations across Lists of GenomicRanges?
0
gravatar for nhejazi
20 months ago by
nhejazi0
nhejazi0 wrote:

I am currently working with GenomicRanges structures stored in a large list (~10,000 entries) produced via output from a `foreach` loop. These lists all contain similar GenomicRanges structures, and I would like to generate a single GenomicRanges or data.frame structure with all of the unique entries across the many structures in the list (and any relevant information, e.g., seqnames, ranges, strand) as well as a count of how many times each appears across the many structures (perhaps in a metadata column). I assume that the best way to accomplish this would be via `ldply` from `plyr`, but am unsure of what a good way to good about this might be? Has anyone solved a similar problem before?

ADD COMMENTlink modified 20 months ago by Michael Lawrence10.0k • written 20 months ago by nhejazi0

Please be more specific. How similar are they? What do you mean by unique entry, i.e., what is an entry? Some code (including construction the first few elements in the list) would help.

ADD REPLYlink written 20 months ago by Michael Lawrence10.0k

Example Code:

permute_limma <- foreach (i = 1:nsim) %dopar% {
    set.seed(6401^2)
    design_2 <- sample(design)
    print(paste0("Limma - the ", i, "th iteration is underway."))
    fit <- lmFit(y, design_2)
    fit <- eBayes(fit)
    tt <- topTable(fit, coef = 2,adjust.method = "BH",
                   sort.by = "none", number = Inf)
    return(tt)
}
ADD REPLYlink modified 20 months ago by Martin Morgan ♦♦ 21k • written 20 months ago by nhejazi0

The premise is simply that a particular data analytic function (e.g., limma) is called on a data set in data.frame or GenomicRanges form. The output is a data structure of genomic sites or regions (e.g., seqnames, positions, etc.). The list returned by foreach contains 1000+ of such output tables. The idea is to look at the frequency with which genomic sites/regions appear in across each of these output tables (contained in the list).

ADD REPLYlink written 20 months ago by nhejazi0
3
gravatar for Michael Lawrence
20 months ago by
Michael Lawrence10.0k
United States
Michael Lawrence10.0k wrote:

Let's say you have a list of GRanges objects, grl, each with the same columns in mcols(). The first step is to mark the list as a list of GRanges, i.e., a GRangesList, because the data are easier to compute on when their semantics are explicit. This can be done with List(). Then unlist() that to concatenate the GRanges together. If you want to then count each unique range, first find the unique ranges, then count how many times they match the original set.

gr <- unlist(List(grl))
gr_levels <- unique(gr)
gr_levels$count <- countMatches(gr_levels, gr)

 

ADD COMMENTlink written 20 months ago by Michael Lawrence10.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 301 users visited in the last hour