Search
Question: GenomicRanges based on indices or more conditions, and add column from match
0
gravatar for francesca casalino
4 weeks ago by
United States
francesca casalino50 wrote:

I am trying to extract columns based on two conditions from the indices of two overlaps. This is an example:

 

    df1 = data.frame(chr=c("chr1", "chr1"), start=c(20,21), stop=c(28,29), value1=c(1,2))

    df2 = data.frame(chr=c("chr1", "chr1", "chr1"), start=c(20,22, 28), stop=c(22,24,34), value2=c(3,4, 60))

 

    df3 = data.frame(chr=c("chr1", "chr1"), start=c(3,1), stop=c(8,4))

    df4 = data.frame(chr=c("chr1", "chr1", "chr2"), start=c(10,1, 1), stop=c(12,2, 2))

 

    df1_all = cbind.data.frame(df1, df3)

df2_all = cbind.data.frame(df2, df4)

 

Which looks like this:

 

    > df1_all

       chr start stop value1  chr start stop 

    1 chr1    20   28      1 chr1     3    8      

    2 chr1    21   29      2 chr1     1    4      

 

    > df2_all

       chr start stop value2  chr start stop 

    1 chr1    20   22      3 chr1    10   12      

    2 chr1    22   24      4 chr1     1    2      

    3 chr1    28   34     60 chr2     1    2    

 

 

I would like to get the values from data frame df1_all, 

together with the matching column from df2_all called "value2", but only for values for which both df1 overlaps df3, and df2 overlaps df4, so in this case it would be:

 

     chr start stop value1  chr start stop value1 value2

    chr1    21   29      2 chr1     1    4      2      4

 

I am almost there but I am still getting something wrong in my real data and I cannot find the bug, I have been trying to find a solution for long now so I am coming here for help and a set of new eyes on this problem. Can you please help?

 

This is what I have:

 

    df1.gr makeGRangesFromDataFrame(df1)

    df2.gr makeGRangesFromDataFrame(df2)

    df3.gr makeGRangesFromDataFrame(df3)

    df4.gr makeGRangesFromDataFrame(df4)

    # First overlap

    hits1 <- findOverlapsdf1.gr, df2.gr, maxgap = 0)

    values1 <- rep(FALSE, nrow(df2_all))

    values1[unique(subjectHits(hits1))] <- TRUE

        

    OBJ= data.frame(df1_all[unique(queryHits(hits1)),],

    matched.df2 = df2_all[unique(queryHits(hits1)),"value2"])

    

    # Second overlap

    hits2 <- findOverlapsdf3.gr, df4.gr, maxgap = 0)

    values2 <- rep(FALSE, nrow(df2_all))

    values2[unique(subjectHits(hits2))] <- TRUE

    

    ov = values1 & values2

    OBJ = OBJ[ov,]

ADD COMMENTlink written 4 weeks ago by francesca casalino50

Not sure I understand your example. But I think you could get further using intersect(hits1, hits2), which would find the rows where df1 overlaps df2 and df3 overlaps df4.
 

ADD REPLYlink written 4 weeks ago by Michael Lawrence10k

Hi, Thank you Michael for your reply. 

My problem is trying to add information from df1_all and df2_all only from the intersecting IDs (with the condition that both ranges overlap):

     OBJ = data.frame(df1_all[unique(subjectHits(intersect(hits1, hits2))),])

But then how to get the columns in df2_all that match? I have tried in so many ways...

Thanks again

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by francesca casalino50

This is basically an inner join, but then reducing the data so that no rows in df1 become repeated. How do you want to reduce the data when one row in df1 overlaps more than one row in df2?

ADD REPLYlink written 4 weeks ago by Michael Lawrence10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 402 users visited in the last hour