Question: Genomic Ranges findOverlaps by sample
gravatar for gaiusjaugustus
13 months ago by
University of Arizona
gaiusjaugustus0 wrote:

I have 2 (very similar but not identical) genomic ranges objects, each with up to 33 samples, that I am trying to find overlaps and combine in a particular way.  I'm trying to do this separately for each sample.  I know I could do this with a for loop:

    subsets <- c(1:33)

    for (i in subsets){
         subset <- df[df$subset == i,] tasks...

However, I assume there must be a better way??  Perhaps with data.table, though this isn't a requirement. Some guidance on where to start would be helpful.


The tasks I'm doing include:

 - Create GRanges objects
 - find Overlaps between df1 & 2
 - Use overlaps to combine segments

The below is for context.  Everything below works fine if I use the forloop structure above, but I'm just trying to wrap my head around how to do this for each File, instead of for the entire df, without doing a for loop for each File.


    File   Chromosome      Min      Max    CN.State
    C_28        1            1       100        1
    C_28        1            150     200        1
    A_1         1            20       25        3
    A_1         1            150     200        3
    df1 <- data.frame(File=c("C_28","C_28","A_1","A_1"), 
    +                      Chromosome=rep(1, 4),
    +                      Min=c(1, 150, 20, 150),
    +                      Max=c(100, 200, 25, 200),
    +                      CN.State=c(1,1,3,3))


    File Chromosome Min Max CN.State
    C_28          1   1 210        1
    A_1           1  15 250        3
    df2 <- data.frame(File=c("C_28","A_1"), 
    +                      Chromosome=rep(1, 2),
    +                      Min=c(1, 15),
    +                      Max=c(210, 250),
    +                      CN.State=c(1,3))

##Simplified Tasks

**Make Genomic Ranges Objects**

    df1 <- makeGRangesFromDataFrame(df1, keep.extra.columns = TRUE, seqnames.field="Chromosome", start.field="Min", end.field = "Max")
    df2 <- makeGRangesFromDataFrame(df2, keep.extra.columns = TRUE, seqnames.field = "Chromosome", start.field = "Min", end.field = "Max")

**Find overlaps & combine**

    hits <- findOverlaps(df1, df2)
    ranges(df1)[queryHits(hits)] <- ranges(df2)[subjectHits(hits)]
ADD COMMENTlink modified 13 months ago by Michael Lawrence9.5k • written 13 months ago by gaiusjaugustus0
gravatar for Michael Lawrence
13 months ago by
United States
Michael Lawrence9.5k wrote:

You don't need to use a for() loop, but you will need to iterate over the samples. You can just split the GRanges and loop in parallel over the two lists, like:

ans <- mapply(function(a, b) {
    hits <- findOverlaps(a, b)
    ranges(a)[queryHits(hits)] <- ranges(b)[subjectHits(hits)]
}, split(df1, ~File), split(df2, ~File))


ADD COMMENTlink written 13 months ago by Michael Lawrence9.5k

As an aside, this would be easier if findOverlaps,GRangesList,GRangesList operated within elements. Could use GenomicRangesList, or maybe make a pfindOverlaps()?

ADD REPLYlink written 13 months ago by Michael Lawrence9.5k

I actually just realized I could use reduce() on the combined regions to do what I want to do, except that it won't keep my extra columns.  I tried translating your solution into 

mapply(reduce, split(CombinedRegions, ~File))

but this doesn't work.  When I try just split(CombinedRegions, ~File), that doesn't work either, and the error makes me think it's because it is a GRanges object.  If you could offer a solution with this, that'd be great.


The error: Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘splitAsList’ for signature ‘"GRanges", "formula"’

ADD REPLYlink written 12 months ago by gaiusjaugustus0

The splitting by formula probably only works in devel. I think you want something like:

reduce(split(CombinedRegions, CombinedRegions$File))


ADD REPLYlink written 12 months ago by Michael Lawrence9.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 269 users visited in the last hour