I'm currently working on genomic data which contains chromosome start and end position. I want to identify the genomic region which overlap to another region and collapse them into new genomic region.
Although I can identify which region are overlap by GenomicRanges package but it return me to a data which I need to filter out. What I want is to remove row which data in column B not in column A
data<- read.csv(textConnection(
"index,queryhits, subjecthits
1, 1, 530,
2, 2, 545,
3, 2, 799,
4, 2, 93,
5, 3, 415,
6, 4, 745,
7, 545,799,
8, 545,93,
9, 545,415,
10, 545,745,
"))
The value in subjecthit column should not in queryhit column. For example, in row number 2 ,queryhit colummn equal to 2 and subjecthits column equal to 545. It means that 545 is grouped with number 2.
However, a value in queryhit is can be 545 that I don't want to count again that why I want to remove row contain 545 value in queryhits column The expect output is
index queryhits subjecthits
1 1 530
2 2 545
3 2 799
4 2 93
5 3 415
6 4 745
My real data is about 20000 row so I want a unique number in both queryhit and subjecthits column. Thank you for any help or suggestion

You didn't generate that
Hitsobject usingread.csv, so showing that step is not useful. Your question doesn't really make sense if you are comparing twoGRangesobjects; having 545 in thesubjectHitscolumn is not the same as having 545 in thequeryHitscolumn unless you actually have aSelfHitsobject.If you want people to be able to help, you need to show the actual code you used to generate that
Hitsobject, and clarify exactly what you are trying to do.