Question

GenomicRanges based on indices or more conditions, and add column from match

0

Entering edit mode

francesca casalino ▴ 50

@francesca-casalino-4984

Last seen 12 months ago

United States

I am trying to extract columns based on two conditions from the indices of two overlaps. This is an example:

df1 = data.frame(chr=c("chr1", "chr1"), start=c(20,21), stop=c(28,29), value1=c(1,2))

df2 = data.frame(chr=c("chr1", "chr1", "chr1"), start=c(20,22, 28), stop=c(22,24,34), value2=c(3,4, 60))

df3 = data.frame(chr=c("chr1", "chr1"), start=c(3,1), stop=c(8,4))

df4 = data.frame(chr=c("chr1", "chr1", "chr2"), start=c(10,1, 1), stop=c(12,2, 2))

df1_all = cbind.data.frame(df1, df3)

df2_all = cbind.data.frame(df2, df4)

Which looks like this:

> df1_all

chr start stop value1 chr start stop

1 chr1 20 28 1 chr1 3 8

2 chr1 21 29 2 chr1 1 4

> df2_all

chr start stop value2 chr start stop

1 chr1 20 22 3 chr1 10 12

2 chr1 22 24 4 chr1 1 2

3 chr1 28 34 60 chr2 1 2

I would like to get the values from data frame df1_all,

together with the matching column from df2_all called "value2", but only for values for which both df1 overlaps df3, and df2 overlaps df4, so in this case it would be:

chr start stop value1 chr start stop value1 value2

chr1 21 29 2 chr1 1 4 2 4

I am almost there but I am still getting something wrong in my real data and I cannot find the bug, I have been trying to find a solution for long now so I am coming here for help and a set of new eyes on this problem. Can you please help?

This is what I have:

df1.gr makeGRangesFromDataFrame(df1)

df2.gr makeGRangesFromDataFrame(df2)

df3.gr makeGRangesFromDataFrame(df3)

df4.gr makeGRangesFromDataFrame(df4)

# First overlap

hits1 <- findOverlapsdf1.gr, df2.gr, maxgap = 0)

values1 <- rep(FALSE, nrow(df2_all))

values1[unique(subjectHits(hits1))] <- TRUE

OBJ= data.frame(df1_all[unique(queryHits(hits1)),],

matched.df2 = df2_all[unique(queryHits(hits1)),"value2"])

# Second overlap

hits2 <- findOverlapsdf3.gr, df4.gr, maxgap = 0)

values2 <- rep(FALSE, nrow(df2_all))

values2[unique(subjectHits(hits2))] <- TRUE

ov = values1 & values2

OBJ = OBJ[ov,]

genomicranges iranges findoverlaps • 1.1k views

ADD COMMENT • link 5.5 years ago francesca casalino ▴ 50

0

Entering edit mode

Not sure I understand your example. But I think you could get further using intersect(hits1, hits2), which would find the rows where df1 overlaps df2 and df3 overlaps df4.

ADD REPLY • link 5.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Hi, Thank you Michael for your reply.

My problem is trying to add information from df1_all and df2_all only from the intersecting IDs (with the condition that both ranges overlap):

OBJ = data.frame(df1_all[unique(subjectHits(intersect(hits1, hits2))),])

But then how to get the columns in df2_all that match? I have tried in so many ways...

Thanks again

ADD REPLY • link 5.5 years ago francesca casalino ▴ 50

0

Entering edit mode

This is basically an inner join, but then reducing the data so that no rows in df1 become repeated. How do you want to reduce the data when one row in df1 overlaps more than one row in df2?

ADD REPLY • link 5.5 years ago Michael Lawrence ★ 11k