I'm currently working on genomic data which contains chromosome start and end position. I want to identify the genomic region which overlap to another region and collapse them into new genomic region.
Although I can identify which region are overlap by GenomicRanges package but it return me to a data which I need to filter out. What I want is to remove row which data in column B not in column A
data<- read.csv(textConnection(
"index,queryhits, subjecthits
1, 1, 530,
2, 2, 545,
3, 2, 799,
4, 2, 93,
5, 3, 415,
6, 4, 745,
7, 545,799,
8, 545,93,
9, 545,415,
10, 545,745,
"))
The value in subjecthit column should not in queryhit column. For example, in row number 2 ,queryhit colummn equal to 2 and subjecthits column equal to 545. It means that 545 is grouped with number 2.
However, a value in queryhit is can be 545 that I don't want to count again that why I want to remove row contain 545 value in queryhits column The expect output is
index queryhits subjecthits
1 1 530
2 2 545
3 2 799
4 2 93
5 3 415
6 4 745
My real data is about 20000 row so I want a unique number in both queryhit and subjecthits column. Thank you for any help or suggestion
You didn't generate that
Hits
object usingread.csv
, so showing that step is not useful. Your question doesn't really make sense if you are comparing twoGRanges
objects; having 545 in thesubjectHits
column is not the same as having 545 in thequeryHits
column unless you actually have aSelfHits
object.If you want people to be able to help, you need to show the actual code you used to generate that
Hits
object, and clarify exactly what you are trying to do.