Search
Question: How to remove duplicated overlap hit index in IntegerList efficiently [solved] ?
0
22 months ago by
Italy
Jurat Shahidin60 wrote:

Hi everyone:

I have list of overlap hit index in IntegerList where some duplicated index exist. I have tried of using unique, or duplicated method from IRanges packages, but duplication can't be removed anyway. Removing duplication for GRanges object is different from IntegerList. However, I tried other way like such as coercing IntegerList to integer vector then use unique, or duplicated method, but if I do this way I could have NA instead (NA shouldn't be in IntegerList). Because I expect to have new IntegerList without duplicated index at the end. I think there might have other approach to accomplish task easily and efficiently . Can anyone propose any idea to accomplish this manipulation easily ? How can I make this happen ?

updated mini example:

hitTB_1 <- list(
foo = IntegerList(1,3,7,10),
bar = IntegerList(1,3,integer(0),8),
cat = IntegerList(1,3,integer(0),10)
)

hitTB_2 <- list(
bar = IntegerList(1,4,8,9,10),
foo = IntegerList(1,4,10,11,integer(0)),
cat = IntegerList(1,4,10,13,14)
)

hitTB_3 <- list(
cat = IntegerList(2,5,7,9,10),
foo = IntegerList(2,5,8,integer(0),10),
bar = IntegerList(2,5,7,integer(0),8)
)

So, each hitTB, order of IntegerList is different, I intend to manipulate them as follows:

idx <-  names(hitTB_1)
hitTB_1
hitTB_2 <- DataFrame(hitTB_2[idx])
hitTB_3 <- DataFrame(hitTB_3[idx])

So I could have same pattern with hitTB_1, and it is easier to combine them into one single list without duplication if it is feasible for doing this way. Any recommendation ?

If I could manipulate them as matrix, then I could get this output as my desired output.

desired output :

output <-
DataFrame(
foo = IntegerList(integer(0),integer(0),1,2,3,4,5,7,8,10,11),
bar = IntegerList(integer(0),10,1,2,3,4,5,integer(0),7,8,9),
cat = IntegerList(9,14,1,2,3,4,5,integer(0),7,10,13)
)


I am stuck with this problem. How can I achieve my expected output easily? Any idea, possible approach are highly appreciated. Thanks a lot.

Best regards:

Jurat

modified 22 months ago • written 22 months ago by Jurat Shahidin60
1
22 months ago by
Valerie Obenchain ♦♦ 6.5k
United States
Valerie Obenchain ♦♦ 6.5k wrote:

Hi Jurat,

Are you getting the list of IntegerLists as the output from a function or are you creating it? If you are constructing it yourself it would be much easier to manipulate in a single IntegerList with NAs:

> hitTB <- IntegerList(
+   hit.1= c(1,2,3,NA,4,NA,NA,6),
+   hit.2 = c(1,1,1,2,NA,3,4,NA),
+   hit.3 = c(1,2,4,4,5,NA,6,7)
+ )
> hitTB
IntegerList of length 3
[["hit.1"]] 1 2 3 <NA> 4 <NA> <NA> 6
[["hit.2"]] 1 1 1 2 <NA> 3 4 <NA>
[["hit.3"]] 1 2 4 4 5 <NA> 6 7

unique() gives the desired result (if NAs are ok):
> unique(hitTB)
IntegerList of length 3
[["hit.1"]] 1 2 3 <NA> 4 6
[["hit.2"]] 1 2 <NA> 3 4
[["hit.3"]] 1 2 4 5 <NA> 6 7

I didn't quite follow your rational of NA values. Do you not want them in the final list?

Valerie

1

My understanding is that he's trying to find the unique elements of an IntegerList, not the unique elements in each integer vector. The simplest approach would be to paste the elements into keys (character vector for each IntegerList). But be careful about order. If it doesn't matter, then sort first.

I think the real answer is at the workflow level. It doesn't seem natural to be in this nested list space.