I have 2 (very similar but not identical) genomic ranges objects, each with up to 33 samples, that I am trying to find overlaps and combine in a particular way. I'm trying to do this separately for each sample. I know I could do this with a for loop:
subsets <- c(1:33) for (i in subsets){ subset <- df[df$subset == i,] ...do tasks... }
However, I assume there must be a better way?? Perhaps with data.table, though this isn't a requirement. Some guidance on where to start would be helpful.
**Tasks**
The tasks I'm doing include:
- Create GRanges objects
- find Overlaps between df1 & 2
- Use overlaps to combine segments
#Example:
The below is for context. Everything below works fine if I use the forloop structure above, but I'm just trying to wrap my head around how to do this for each File, instead of for the entire df, without doing a for loop for each File.
df1
File Chromosome Min Max CN.State C_28 1 1 100 1 C_28 1 150 200 1 A_1 1 20 25 3 A_1 1 150 200 3 df1 <- data.frame(File=c("C_28","C_28","A_1","A_1"), + Chromosome=rep(1, 4), + Min=c(1, 150, 20, 150), + Max=c(100, 200, 25, 200), + CN.State=c(1,1,3,3))
df2
File Chromosome Min Max CN.State C_28 1 1 210 1 A_1 1 15 250 3 df2 <- data.frame(File=c("C_28","A_1"), + Chromosome=rep(1, 2), + Min=c(1, 15), + Max=c(210, 250), + CN.State=c(1,3))
##Simplified Tasks
**Make Genomic Ranges Objects**
df1 <- makeGRangesFromDataFrame(df1, keep.extra.columns = TRUE, seqnames.field="Chromosome", start.field="Min", end.field = "Max") df2 <- makeGRangesFromDataFrame(df2, keep.extra.columns = TRUE, seqnames.field = "Chromosome", start.field = "Min", end.field = "Max")
**Find overlaps & combine**
hits <- findOverlaps(df1, df2) ranges(df1)[queryHits(hits)] <- ranges(df2)[subjectHits(hits)]
As an aside, this would be easier if
findOverlaps,GRangesList,GRangesList
operated within elements. Could use GenomicRangesList, or maybe make apfindOverlaps()
?I actually just realized I could use reduce() on the combined regions to do what I want to do, except that it won't keep my extra columns. I tried translating your solution into
but this doesn't work. When I try just
split(CombinedRegions, ~File)
, that doesn't work either, and the error makes me think it's because it is a GRanges object. If you could offer a solution with this, that'd be great.The error:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘splitAsList’ for signature ‘"GRanges", "formula"’
The splitting by formula probably only works in devel. I think you want something like: