Question: Genomic Ranges findOverlaps by sample
gravatar for gaiusjaugustus
2.3 years ago by
University of Arizona
gaiusjaugustus0 wrote:

I have 2 (very similar but not identical) genomic ranges objects, each with up to 33 samples, that I am trying to find overlaps and combine in a particular way.  I'm trying to do this separately for each sample.  I know I could do this with a for loop:

    subsets <- c(1:33)

    for (i in subsets){
         subset <- df[df$subset == i,] tasks...

However, I assume there must be a better way??  Perhaps with data.table, though this isn't a requirement. Some guidance on where to start would be helpful.


The tasks I'm doing include:

 - Create GRanges objects
 - find Overlaps between df1 & 2
 - Use overlaps to combine segments

The below is for context.  Everything below works fine if I use the forloop structure above, but I'm just trying to wrap my head around how to do this for each File, instead of for the entire df, without doing a for loop for each File.


    File   Chromosome      Min      Max    CN.State
    C_28        1            1       100        1
    C_28        1            150     200        1
    A_1         1            20       25        3
    A_1         1            150     200        3
    df1 <- data.frame(File=c("C_28","C_28","A_1","A_1"), 
    +                      Chromosome=rep(1, 4),
    +                      Min=c(1, 150, 20, 150),
    +                      Max=c(100, 200, 25, 200),
    +                      CN.State=c(1,1,3,3))


    File Chromosome Min Max CN.State
    C_28          1   1 210        1
    A_1           1  15 250        3
    df2 <- data.frame(File=c("C_28","A_1"), 
    +                      Chromosome=rep(1, 2),
    +                      Min=c(1, 15),
    +                      Max=c(210, 250),
    +                      CN.State=c(1,3))

##Simplified Tasks

**Make Genomic Ranges Objects**

    df1 <- makeGRangesFromDataFrame(df1, keep.extra.columns = TRUE, seqnames.field="Chromosome", start.field="Min", end.field = "Max")
    df2 <- makeGRangesFromDataFrame(df2, keep.extra.columns = TRUE, seqnames.field = "Chromosome", start.field = "Min", end.field = "Max")

**Find overlaps & combine**

    hits <- findOverlaps(df1, df2)
    ranges(df1)[queryHits(hits)] <- ranges(df2)[subjectHits(hits)]
ADD COMMENTlink modified 2.3 years ago by Michael Lawrence10k • written 2.3 years ago by gaiusjaugustus0
gravatar for Michael Lawrence
2.3 years ago by
United States
Michael Lawrence10k wrote:

You don't need to use a for() loop, but you will need to iterate over the samples. You can just split the GRanges and loop in parallel over the two lists, like:

ans <- mapply(function(a, b) {
    hits <- findOverlaps(a, b)
    ranges(a)[queryHits(hits)] <- ranges(b)[subjectHits(hits)]
}, split(df1, ~File), split(df2, ~File))


ADD COMMENTlink written 2.3 years ago by Michael Lawrence10k

As an aside, this would be easier if findOverlaps,GRangesList,GRangesList operated within elements. Could use GenomicRangesList, or maybe make a pfindOverlaps()?

ADD REPLYlink written 2.3 years ago by Michael Lawrence10k

I actually just realized I could use reduce() on the combined regions to do what I want to do, except that it won't keep my extra columns.  I tried translating your solution into 

mapply(reduce, split(CombinedRegions, ~File))

but this doesn't work.  When I try just split(CombinedRegions, ~File), that doesn't work either, and the error makes me think it's because it is a GRanges object.  If you could offer a solution with this, that'd be great.


The error: Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘splitAsList’ for signature ‘"GRanges", "formula"’

ADD REPLYlink written 2.3 years ago by gaiusjaugustus0

The splitting by formula probably only works in devel. I think you want something like:

reduce(split(CombinedRegions, CombinedRegions$File))


ADD REPLYlink written 2.3 years ago by Michael Lawrence10k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 233 users visited in the last hour