I have list of
GRanges that needed to apply very specific duplicate removal . I have reason for using specific conditional duplicate removal for my data. However, duplicate removal condition for each individual
GRanges is different. I want to do complete duplicate removal for first list element; for second list element, I need to search the row that appear more than twice (freq >2), and only keep one row; for third list element, search over the row that appear more than three times (freq>3), and keep two or three rows. I am trying to get more programmatic, dynamic solution for this data manipulation task. How can I make this happen easily ? Any way to accomplish this task more efficiently respect to my specific output ? Any idea please ?
(thanks for @Martin' edit on my reproducible data).
mini example :
grl <- GRangesList( bar= GRanges(seqnames = Rle("chr1",14), IRanges( c(9,19,34,54,70,82,136,9,34,70,136,9,82,136), c(14,21,39,61,73,87,153,14,39,73,153,14,87,153)), score=c(48,6,9,8,4,15,38,48,9,4,38,48,15,38)), cat = GRanges(seqnames = Rle("chr10",16), IRanges( c(7,21,21,72,142,7,16,21,45,72,100,114,142,16,72,114), c(10,34,34,78,147,10,17,34,51,78,103,124,147,17,78,124)), score=c(53,14,14,20,4,53,20,14,11,20,7,32,4,20,20,32)), foo= GRanges(seqnames = Rle("chr11",16), IRanges( c(12,12,12,58,58,58,118,12,12,44,58,102,118,12,58,118), c(36,36,36,92,92,92,139,36,36,49,92,109,139,36,92,139)), score=c(48,48,48,12,12,12,5,48,48,12,12,11,5,48,12,5)) )
Note that in
cat, I am going to look up the rows that appear three times, and keep that rows only once; if row appear twice, I don't do duplicate removal on that. in
foo, I am going to check the rows that appear more than three times, and keep two or three same rows instead. This is what I am trying to make very specific duplicate removal for each
GRange. How can I get my output ?
This is my desired output :
grl_expected <- GRangesList( bar= GRanges(seqnames = Rle("chr1",7), IRanges( c(9,19,34,54,70,82,136), c(14,21,39,61,73,87,153)), score=c(48,6,9,8,4,15,38)), cat= GRanges(seqnames = Rle("chr10",12), IRanges( c(7,21,72,142,7,16,45,100,114,142,16,114), c(10,34,78,147,10,17,51,103,124,147,17,124)), score=c(53,14,20,4,53,20,11,7,32,4,20,32)), foo= GRanges(seqnames = Rle("chr11",11), IRanges( c(12,12,12,44,58,58,58,118,102,118,118), c(36,36,36,49,92,92,92,139,109,139,139)), score=c(48,48,48,17,12,12,12,5,11,5,5)) )
can any one point me out how to make this happen ? Any idea ?
Best regards :