I have a series of genomic ranges accompanied with metadata.
For example, the simplest case is:
Chromosome Start End Score
chr1 1234567 1234987 15
chr1 1234577 1234999 25
chr1 1234560 1234940 50
chr2 1234567 1234987 15
chr2 1234577 1234999 25
chr2 1234560 1234940 50
I would like to be able to select only one for each overlapping batch based on a condition on the metadata (e.g. from those that overlap keep the one with the max score, etc).
In the above example, the dataset should be reduced to:
chr1 1234560 1234940 50
chr2 1234560 1234940 50
.. since the first 3 overlap and the third is the one with the highest score, while the last three also overlap and the 6th has the highest score.
Any ideas?