Question

how to remove row which value were found in group before

0

Entering edit mode

naktang1 • 0

@naktang1-8825

Last seen 4.7 years ago

Thailand

I'm currently working on genomic data which contains chromosome start and end position. I want to identify the genomic region which overlap to another region and collapse them into new genomic region.

Although I can identify which region are overlap by GenomicRanges package but it return me to a data which I need to filter out. What I want is to remove row which data in column B not in column A

data<- read.csv(textConnection(
"index,queryhits, subjecthits
 1, 1,  530,
 2, 2,  545,
 3, 2,  799,
 4, 2,  93,
 5, 3,  415,
 6, 4,  745,
 7, 545,799,
 8, 545,93,
 9, 545,415,
 10, 545,745,
 "))

The value in subjecthit column should not in queryhit column. For example, in row number 2 ,queryhit colummn equal to 2 and subjecthits column equal to 545. It means that 545 is grouped with number 2.

However, a value in queryhit is can be 545 that I don't want to count again that why I want to remove row contain 545 value in queryhits column The expect output is

    index queryhits  subjecthits
     1 1    530
     2 2    545
     3 2    799
     4 2    93
     5 3    415
     6 4    745

My real data is about 20000 row so I want a unique number in both queryhit and subjecthits column. Thank you for any help or suggestion

GenomicRanges r • 480 views

ADD COMMENT • link 4.9 years ago naktang1 • 0

0

Entering edit mode

You didn't generate that Hits object using read.csv, so showing that step is not useful. Your question doesn't really make sense if you are comparing two GRanges objects; having 545 in the subjectHits column is not the same as having 545 in the queryHits column unless you actually have a SelfHits object.

If you want people to be able to help, you need to show the actual code you used to generate that Hits object, and clarify exactly what you are trying to do.

ADD REPLY • link 4.9 years ago James W. MacDonald 65k