Question: How to remove duplicated overlap hit index in IntegerList efficiently [solved] ?
gravatar for Jurat Shahidin
2.1 years ago by
Jurat Shahidin60 wrote:

Hi everyone:

I have list of overlap hit index in IntegerList where some duplicated index exist. I have tried of using unique, or duplicated method from IRanges packages, but duplication can't be removed anyway. Removing duplication for GRanges object is different from IntegerList. However, I tried other way like such as coercing IntegerList to integer vector then use unique, or duplicated method, but if I do this way I could have NA instead (NA shouldn't be in IntegerList). Because I expect to have new IntegerList without duplicated index at the end. I think there might have other approach to accomplish task easily and efficiently . Can anyone propose any idea to accomplish this manipulation easily ? How can I make this happen ? 

updated mini example:

hitTB_1 <- list(
  foo = IntegerList(1,3,7,10),
  bar = IntegerList(1,3,integer(0),8),
  cat = IntegerList(1,3,integer(0),10)

hitTB_2 <- list(
  bar = IntegerList(1,4,8,9,10),
  foo = IntegerList(1,4,10,11,integer(0)),
  cat = IntegerList(1,4,10,13,14)

hitTB_3 <- list(
  cat = IntegerList(2,5,7,9,10),
  foo = IntegerList(2,5,8,integer(0),10),
  bar = IntegerList(2,5,7,integer(0),8)

So, each hitTB, order of IntegerList is different, I intend to manipulate them as follows:

idx <-  names(hitTB_1)
hitTB_2 <- DataFrame(hitTB_2[idx])
hitTB_3 <- DataFrame(hitTB_3[idx])

So I could have same pattern with hitTB_1, and it is easier to combine them into one single list without duplication if it is feasible for doing this way. Any recommendation ?

If I could manipulate them as matrix, then I could get this output as my desired output.

desired output :

output <-
    foo = IntegerList(integer(0),integer(0),1,2,3,4,5,7,8,10,11),
    bar = IntegerList(integer(0),10,1,2,3,4,5,integer(0),7,8,9),
    cat = IntegerList(9,14,1,2,3,4,5,integer(0),7,10,13)

I am stuck with this problem. How can I achieve my expected output easily? Any idea, possible approach are highly appreciated. Thanks a lot.

Best regards:


ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Jurat Shahidin60
gravatar for Valerie Obenchain
2.1 years ago by
Valerie Obenchain ♦♦ 6.6k
United States
Valerie Obenchain ♦♦ 6.6k wrote:

Hi Jurat,

Are you getting the list of IntegerLists as the output from a function or are you creating it? If you are constructing it yourself it would be much easier to manipulate in a single IntegerList with NAs:

> hitTB <- IntegerList(
+   hit.1= c(1,2,3,NA,4,NA,NA,6),
+   hit.2 = c(1,1,1,2,NA,3,4,NA),
+   hit.3 = c(1,2,4,4,5,NA,6,7)
+ )
> hitTB
IntegerList of length 3
[["hit.1"]] 1 2 3 <NA> 4 <NA> <NA> 6
[["hit.2"]] 1 1 1 2 <NA> 3 4 <NA>
[["hit.3"]] 1 2 4 4 5 <NA> 6 7

unique() gives the desired result (if NAs are ok):
> unique(hitTB)
IntegerList of length 3
[["hit.1"]] 1 2 3 <NA> 4 6
[["hit.2"]] 1 2 <NA> 3 4
[["hit.3"]] 1 2 4 5 <NA> 6 7


I didn't quite follow your rational of NA values. Do you not want them in the final list?




ADD COMMENTlink written 2.1 years ago by Valerie Obenchain ♦♦ 6.6k

My understanding is that he's trying to find the unique elements of an IntegerList, not the unique elements in each integer vector. The simplest approach would be to paste the elements into keys (character vector for each IntegerList). But be careful about order. If it doesn't matter, then sort first.

I think the real answer is at the workflow level. It doesn't seem natural to be in this nested list space.

ADD REPLYlink written 2.1 years ago by Michael Lawrence10k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 363 users visited in the last hour