Question

How to remove duplicated overlap hit index in IntegerList efficiently [solved] ?

0

Entering edit mode

Jurat Shahidin ▴ 80

@jurat-shahidin-9488

Last seen 4.1 years ago

Chicago, IL, USA

Hi everyone:

I have list of overlap hit index in IntegerList where some duplicated index exist. I have tried of using unique, or duplicated method from IRanges packages, but duplication can't be removed anyway. Removing duplication for GRanges object is different from IntegerList. However, I tried other way like such as coercing IntegerList to integer vector then use unique, or duplicated method, but if I do this way I could have NA instead (NA shouldn't be in IntegerList). Because I expect to have new IntegerList without duplicated index at the end. I think there might have other approach to accomplish task easily and efficiently . Can anyone propose any idea to accomplish this manipulation easily ? How can I make this happen ?

updated mini example:

hitTB_1 <- list(
  foo = IntegerList(1,3,7,10),
  bar = IntegerList(1,3,integer(0),8),
  cat = IntegerList(1,3,integer(0),10)
)

hitTB_2 <- list(
  bar = IntegerList(1,4,8,9,10),
  foo = IntegerList(1,4,10,11,integer(0)),
  cat = IntegerList(1,4,10,13,14)
)

hitTB_3 <- list(
  cat = IntegerList(2,5,7,9,10),
  foo = IntegerList(2,5,8,integer(0),10),
  bar = IntegerList(2,5,7,integer(0),8)
)

So, each hitTB, order of IntegerList is different, I intend to manipulate them as follows:

idx <-  names(hitTB_1)
hitTB_1
hitTB_2 <- DataFrame(hitTB_2[idx])
hitTB_3 <- DataFrame(hitTB_3[idx])

So I could have same pattern with hitTB_1, and it is easier to combine them into one single list without duplication if it is feasible for doing this way. Any recommendation ?

If I could manipulate them as matrix, then I could get this output as my desired output.

desired output :

output <-
  DataFrame(
    foo = IntegerList(integer(0),integer(0),1,2,3,4,5,7,8,10,11),
    bar = IntegerList(integer(0),10,1,2,3,4,5,integer(0),7,8,9),
    cat = IntegerList(9,14,1,2,3,4,5,integer(0),7,10,13)
  )

I am stuck with this problem. How can I achieve my expected output easily? Any idea, possible approach are highly appreciated. Thanks a lot.

Best regards:

Jurat

r iranges integerlist • 1.2k views

ADD COMMENT • link 7.6 years ago Jurat Shahidin ▴ 80

score 1 · Answer 1 · 2016-10-07

1

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 2.3 years ago

United States

Hi Jurat,

Are you getting the list of IntegerLists as the output from a function or are you creating it? If you are constructing it yourself it would be much easier to manipulate in a single IntegerList with NAs:

> hitTB <- IntegerList(
+ hit.1= c(1,2,3,NA,4,NA,NA,6),
+ hit.2 = c(1,1,1,2,NA,3,4,NA),
+ hit.3 = c(1,2,4,4,5,NA,6,7)
+ )
> hitTB
IntegerList of length 3
[["hit.1"]] 1 2 3 <NA> 4 <NA> <NA> 6
[["hit.2"]] 1 1 1 2 <NA> 3 4 <NA>
[["hit.3"]] 1 2 4 4 5 <NA> 6 7

unique() gives the desired result (if NAs are ok):
> unique(hitTB)
IntegerList of length 3
[["hit.1"]] 1 2 3 <NA> 4 6
[["hit.2"]] 1 2 <NA> 3 4
[["hit.3"]] 1 2 4 5 <NA> 6 7

I didn't quite follow your rational of NA values. Do you not want them in the final list?

Valerie

ADD COMMENT • link 7.6 years ago Valerie Obenchain ★ 6.8k

1

Entering edit mode

My understanding is that he's trying to find the unique elements of an IntegerList, not the unique elements in each integer vector. The simplest approach would be to paste the elements into keys (character vector for each IntegerList). But be careful about order. If it doesn't matter, then sort first.

I think the real answer is at the workflow level. It doesn't seem natural to be in this nested list space.

ADD REPLY • link 7.6 years ago Michael Lawrence ★ 11k