Search
Question: How to remove duplicated overlap hit index in IntegerList efficiently [solved] ?
0
gravatar for Jurat Shahidin
14 months ago by
Italy
Jurat Shahidin60 wrote:

Hi everyone:

I have list of overlap hit index in IntegerList where some duplicated index exist. I have tried of using unique, or duplicated method from IRanges packages, but duplication can't be removed anyway. Removing duplication for GRanges object is different from IntegerList. However, I tried other way like such as coercing IntegerList to integer vector then use unique, or duplicated method, but if I do this way I could have NA instead (NA shouldn't be in IntegerList). Because I expect to have new IntegerList without duplicated index at the end. I think there might have other approach to accomplish task easily and efficiently . Can anyone propose any idea to accomplish this manipulation easily ? How can I make this happen ? 

updated mini example:

hitTB_1 <- list(
  foo = IntegerList(1,3,7,10),
  bar = IntegerList(1,3,integer(0),8),
  cat = IntegerList(1,3,integer(0),10)
)

hitTB_2 <- list(
  bar = IntegerList(1,4,8,9,10),
  foo = IntegerList(1,4,10,11,integer(0)),
  cat = IntegerList(1,4,10,13,14)
)

hitTB_3 <- list(
  cat = IntegerList(2,5,7,9,10),
  foo = IntegerList(2,5,8,integer(0),10),
  bar = IntegerList(2,5,7,integer(0),8)
)

So, each hitTB, order of IntegerList is different, I intend to manipulate them as follows:

idx <-  names(hitTB_1)
hitTB_1
hitTB_2 <- DataFrame(hitTB_2[idx])
hitTB_3 <- DataFrame(hitTB_3[idx])

So I could have same pattern with hitTB_1, and it is easier to combine them into one single list without duplication if it is feasible for doing this way. Any recommendation ?

If I could manipulate them as matrix, then I could get this output as my desired output.

desired output :

output <-
  DataFrame(
    foo = IntegerList(integer(0),integer(0),1,2,3,4,5,7,8,10,11),
    bar = IntegerList(integer(0),10,1,2,3,4,5,integer(0),7,8,9),
    cat = IntegerList(9,14,1,2,3,4,5,integer(0),7,10,13)
  )

I am stuck with this problem. How can I achieve my expected output easily? Any idea, possible approach are highly appreciated. Thanks a lot.

Best regards:

Jurat

ADD COMMENTlink modified 14 months ago • written 14 months ago by Jurat Shahidin60
1
gravatar for Valerie Obenchain
14 months ago by
Valerie Obenchain ♦♦ 6.4k
United States
Valerie Obenchain ♦♦ 6.4k wrote:

Hi Jurat,

Are you getting the list of IntegerLists as the output from a function or are you creating it? If you are constructing it yourself it would be much easier to manipulate in a single IntegerList with NAs:

> hitTB <- IntegerList(
+   hit.1= c(1,2,3,NA,4,NA,NA,6),
+   hit.2 = c(1,1,1,2,NA,3,4,NA),
+   hit.3 = c(1,2,4,4,5,NA,6,7)
+ )
> hitTB
IntegerList of length 3
[["hit.1"]] 1 2 3 <NA> 4 <NA> <NA> 6
[["hit.2"]] 1 1 1 2 <NA> 3 4 <NA>
[["hit.3"]] 1 2 4 4 5 <NA> 6 7

unique() gives the desired result (if NAs are ok):
> unique(hitTB)
IntegerList of length 3
[["hit.1"]] 1 2 3 <NA> 4 6
[["hit.2"]] 1 2 <NA> 3 4
[["hit.3"]] 1 2 4 5 <NA> 6 7

 

I didn't quite follow your rational of NA values. Do you not want them in the final list?

Valerie

 

 

ADD COMMENTlink written 14 months ago by Valerie Obenchain ♦♦ 6.4k
1

My understanding is that he's trying to find the unique elements of an IntegerList, not the unique elements in each integer vector. The simplest approach would be to paste the elements into keys (character vector for each IntegerList). But be careful about order. If it doesn't matter, then sort first.

I think the real answer is at the workflow level. It doesn't seem natural to be in this nested list space.

ADD REPLYlink written 14 months ago by Michael Lawrence9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 420 users visited in the last hour