**70**wrote:

Hi,

I did parallel searching overlapped regions from two `GRanges `

objects, and I did `Vectorization `

over them, so I got `CompressedIntegerList`

objects (a.k.a, non-overlapped regions indicates `integer(0)`

in the list).

Now, I need to sum up two vectors (a.k.a, sum of vector indicates overall overlapped regions number for each). However, I did keep only one (the one with `lowest pvalue`

if multiple intersection found) and I have new `CompressedIntegerList`

objects, but it has `NAs `

index now ( `NAs `

values indicates non-overlapped regions).

I did sum over two vector, but R consider length of regions with `NAs `

index as `1`

, while R also consider length of region with `interger(0)`

as `0`

. So, I got wrong results if I used regions with `NAs `

index for vector sum(though I kept only one regiosn with `lowest pvalue`

, but it has `NAs`

index, this gives wrong results for vector sum) .

I tried every possible solution from statckoverflow and I could not find solution yet. How can I fix this issue? Thank you

For the sake of clarity of problem, I have this reproducible example:

# set up

a <- GRanges( seqnames=Rle(c("chr1", "chr2", "chr3", "chr4"), c(3, 2, 1, 2)), ranges=IRanges(seq(1, by=9, len=8), seq(7, by=9, len=8)), rangeName=letters[seq(1:8)], score=sample(1:20, 8, replace = FALSE)) b <- GRanges( seqnames=Rle(c("chr1", "chr2", "chr3","chr4"), c(4, 1, 2, 1)), ranges=IRanges(seq(2, by=6, len=8), seq(4, by=6, len=8)), rangeName=letters[seq(1:8)], score=sample(1:20, 8, replace = FALSE)) c <- GRanges( seqnames=Rle(c("chr1", "chr2", "chr3","chr4"), c(2, 3, 1,2)), ranges=IRanges(seq(6, by=7, len=8), seq(8, by=7, len=8)), rangeName=letters[seq(1:8)], score=sample(1:20, 8, replace = FALSE))

# pre-step : pvalue conversion for a, b,c

a$pvalue <- 10^(score(a)/-1) a$score <- NULL b$pvalue <- 10^(score(b)/-1) b$score <- NULL c$pvalue <- 10^(score(c)/-1) c$score <- NULL

#find overlap

ov_ab <- as(findOverlaps(a, b), "List") ov_ac <- as(findOverlaps(a, c), "List")

#keep only one regions with lowest pvalue from multiple intersection

b_idx <- as(which.min(extractList(b$pvalue, ov_ab)), "List") c_idx <- as(which.min(extractList(c$pvalue, ov_ac)), "List")

# count total overlapped number : (all NA must be replaced with integer(0) to proceed vector sum)

b_idx <- c(1,2,1,1,NA, NA, NA, NA) c_idx <- c(1,1, NA, 2, NA, NA, 1, NA)

# my desired output (I did manually, because I tried to replace NA with integer(0) but I always get error) :

a.len <- as(c(1,1,1,1,1,1,1,1), "List") b_one.len <- as(c(1,1,1,1, 0, 0, 0, 0), "List") c_one.len <- as(c(1,1,0, 1, 0, 0,1, 0), "List") totNumOvlap <- a.len + b_one.len + c_one.len

# this is my desire output (vector sum is very crucial for accuracy of overall result) :

totNumOvlp <- c(3,3,2,3,1,1,2,1)

**Objective**: I must replace `NAs `

values into `integer(0)`

in compressedIntegerList objects, then I can able to do vector sum over them.

How can I address this problem ? Is there any alternative way to solve the issue ? Thanks a lot

Best regards:

Jurat

Please correct your code (lengths() rather than lenghts() and format code-blocks with 'Code' rather than 'Inline Code' (continue to use 'Inline Code' for things like

`GRanges`

in your first paragraph, where the code is part of the paragraph and not stand-alone.Please provide the output that you are expecting -- a vector of length 4, I guess, but with what numbers? Maybe

`(lengths(a.S_ab.hitlist) >= 1) + (lengths(a.S_ac.hitlist) >= 1)`

?23kDear Martin Morgan:

Thanks you for reminding me that format issue of my posting, I have updated my original post.

However, I have followed your strategy from our discussion, I think your suggestion is very efficient to solve my problem. In our algorithm , parallel searching overlapped regions for every regions from

`a`

over`b ,c`

, and count overall number of overlapped regions including regions from sample`a,`

then compare with threshold value and make a decision whether all keep it or discard it. So, counting total overlapped regions for each regions from sample`a`

in parallel way will effect overall accuracy of my results. Thanks a lotBest regards:

Jurat

70I took the

`lengths(a)`

and added the`lengths(ov_ab) >= 1`

and the`lengths(ov_ac) >= 1`

23kFrom your solution, I am wondering even I got this result that match with my desired output , but How do I know which regions from

`b,c`

were kept (the one with lowest pvalue) ? (because in`ov_ab, ov_ac`

hitlist, we got multiple intersection, so we need to only keep one and I must know which one it was).My strategy is firstly I have to keep only one regions out of multiple intersection from

`ov_ab, ov_ac`

, then I need to expand`b_one, c_one`

respectively, finally I will do vector sum by using`lenghts`

method from S4Vector packages.How do I expand

`b_one , c_one`

that only kept one regions out of multiple intersection before doing vector sum? Because this part is one of the most important features of our algorithm. Thank you for your effort.70Hi Jurat -- I can't really spend more time helping here, so this will be my last reply; maybe others will help further. You know the answer to your current question from

`lengths(a) + (lengths(ov_ab) >= 1) + (lengths(ov_ac) >= 1)`

. You know the regions that overlap from your earlier work, which you have edited out. Best of luck. Martin23kI am sorry, I think you did your best to solve my problem, I am grateful for your help. I can figure that out.

70