GenomicRanges based on indices or more conditions, and add column from match
0
0
Entering edit mode
@francesca-casalino-4984
Last seen 12 months ago
United States

I am trying to extract columns based on two conditions from the indices of two overlaps. This is an example:

 

    df1 = data.frame(chr=c("chr1", "chr1"), start=c(20,21), stop=c(28,29), value1=c(1,2))

    df2 = data.frame(chr=c("chr1", "chr1", "chr1"), start=c(20,22, 28), stop=c(22,24,34), value2=c(3,4, 60))

 

    df3 = data.frame(chr=c("chr1", "chr1"), start=c(3,1), stop=c(8,4))

    df4 = data.frame(chr=c("chr1", "chr1", "chr2"), start=c(10,1, 1), stop=c(12,2, 2))

 

    df1_all = cbind.data.frame(df1, df3)

df2_all = cbind.data.frame(df2, df4)

 

Which looks like this:

 

    > df1_all

       chr start stop value1  chr start stop 

    1 chr1    20   28      1 chr1     3    8      

    2 chr1    21   29      2 chr1     1    4      

 

    > df2_all

       chr start stop value2  chr start stop 

    1 chr1    20   22      3 chr1    10   12      

    2 chr1    22   24      4 chr1     1    2      

    3 chr1    28   34     60 chr2     1    2    

 

 

I would like to get the values from data frame df1_all, 

together with the matching column from df2_all called "value2", but only for values for which both df1 overlaps df3, and df2 overlaps df4, so in this case it would be:

 

     chr start stop value1  chr start stop value1 value2

    chr1    21   29      2 chr1     1    4      2      4

 

I am almost there but I am still getting something wrong in my real data and I cannot find the bug, I have been trying to find a solution for long now so I am coming here for help and a set of new eyes on this problem. Can you please help?

 

This is what I have:

 

    df1.gr makeGRangesFromDataFrame(df1)

    df2.gr makeGRangesFromDataFrame(df2)

    df3.gr makeGRangesFromDataFrame(df3)

    df4.gr makeGRangesFromDataFrame(df4)

    # First overlap

    hits1 <- findOverlapsdf1.gr, df2.gr, maxgap = 0)

    values1 <- rep(FALSE, nrow(df2_all))

    values1[unique(subjectHits(hits1))] <- TRUE

        

    OBJ= data.frame(df1_all[unique(queryHits(hits1)),],

    matched.df2 = df2_all[unique(queryHits(hits1)),"value2"])

    

    # Second overlap

    hits2 <- findOverlapsdf3.gr, df4.gr, maxgap = 0)

    values2 <- rep(FALSE, nrow(df2_all))

    values2[unique(subjectHits(hits2))] <- TRUE

    

    ov = values1 & values2

    OBJ = OBJ[ov,]

genomicranges iranges findoverlaps • 1.1k views
ADD COMMENT
0
Entering edit mode

Not sure I understand your example. But I think you could get further using intersect(hits1, hits2), which would find the rows where df1 overlaps df2 and df3 overlaps df4.
 

ADD REPLY
0
Entering edit mode

Hi, Thank you Michael for your reply. 

My problem is trying to add information from df1_all and df2_all only from the intersecting IDs (with the condition that both ranges overlap):

     OBJ = data.frame(df1_all[unique(subjectHits(intersect(hits1, hits2))),])

But then how to get the columns in df2_all that match? I have tried in so many ways...

Thanks again

ADD REPLY
0
Entering edit mode

This is basically an inner join, but then reducing the data so that no rows in df1 become repeated. How do you want to reduce the data when one row in df1 overlaps more than one row in df2?

ADD REPLY

Login before adding your answer.

Traffic: 882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6