breakTies for a Hits-object based on value in mcols?
1
0
Entering edit mode
maltethodberg ▴ 140
@maltethodberg-9690
Last seen 5 weeks ago
Denmark

Say I have Hits-object along these lines:

from <- c(5, 2, 3, 3, 3, 2)
to <- c(11, 15, 5, 4, 6, 11)

hits <- Hits(from, to, 7, 15, sort.by.query=TRUE)

For every hit, I can assign some value:

mcols(hits)$val <- c(10, 11, 15, 12, 10, 10) I then want to break all ties, similar to breakTies: breakTies(hits, "first") However, I don't just want to resolve ties by the index, but rather on the val-column in mcols, i.e. the maximum value so the output would look like this: hits[c(2,3,6)] Is there a smart way on doing this, without first coercing the Hits-object into a data.frame and then back? s4vectors hits • 641 views ADD COMMENT 3 Entering edit mode @michael-lawrence-3846 Last seen 6 weeks ago United States You could aggregate like this: idx <- which.max(splitAsList(mcols(hits)$val, queryHits(hits)), global=TRUE)
hits[idx]

Maybe we should support something like this?

breakTies(hits, select="last", rank=~val)

It would be easy to support.

0
Entering edit mode

It looks like Hits need to stay sorted by query:

hits <- sort(hits, by = ~ queryHits + val)
breakTies(hits, "last")
0
Entering edit mode

I get the following error when running this code:

> sort(hits, by = ~val, decreasing=TRUE)
Error in as.vector(x) : no method for coercing this S4 class to a vector

In any case, doesn't breakTies always choose based on the index value, rather than row order?

hits2 <- as(hits, "Hits")
breakTies(hits2[sample(1:6)]) # Always gives same output
1
Entering edit mode

I edited by answer; hopefully improved.

0
Entering edit mode

Thanks! So I guess that means there is no solution using breakTies - although as you write in the edited post, that seems like an obvious functionality for that function to have (The current documentation for breakTies is also a bit unclear).

As was unaware of the global=TRUE argument to which.max/which.min - that's a neat little trick!

0
Entering edit mode

Well there is now in devel, S4Vectors 0.17.13.

I also wanted to mention that the sort() failed, I think, because IRanges wasn't loaded. Some stuff still needs to be moved over.

Jeff's comment on needing to include the queryHits is no longer true. Calling sort() on a "SortedByQueryHits" will coerce to an ordinary "Hits", unless the sort is actually by the query hits, in which case it does nothing.