breakTies for a Hits-object based on value in mcols?
1
0
Entering edit mode
maltethodberg ▴ 180
@maltethodberg-9690
Last seen 3 hours ago
Denmark

Say I have Hits-object along these lines:

from <- c(5, 2, 3, 3, 3, 2)
to <- c(11, 15, 5, 4, 6, 11)

hits <- Hits(from, to, 7, 15, sort.by.query=TRUE)

For every hit, I can assign some value:

mcols(hits)$val <-  c(10, 11, 15, 12, 10, 10)

I then want to break all ties, similar to breakTies:

breakTies(hits, "first")

However, I don't just want to resolve ties by the index, but rather on the val-column in mcols, i.e. the maximum value so the output would look like this:

hits[c(2,3,6)]

Is there a smart way on doing this, without first coercing the Hits-object into a data.frame and then back?

s4vectors hits • 1.7k views
ADD COMMENT
3
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

You could aggregate like this:

idx <- which.max(splitAsList(mcols(hits)$val, queryHits(hits)), global=TRUE)
hits[idx]

Maybe we should support something like this?

breakTies(hits, select="last", rank=~val)

It would be easy to support.

ADD COMMENT
0
Entering edit mode

It looks like Hits need to stay sorted by query:

hits <- sort(hits, by = ~ queryHits + val)
breakTies(hits, "last")
ADD REPLY
0
Entering edit mode

I get the following error when running this code:
​
> sort(hits, by = ~val, decreasing=TRUE)
Error in as.vector(x) : no method for coercing this S4 class to a vector 

In any case, doesn't breakTies always choose based on the index value, rather than row order?

hits2 <- as(hits, "Hits")
breakTies(hits2[sample(1:6)]) # Always gives same output
ADD REPLY
1
Entering edit mode

I edited by answer; hopefully improved.

ADD REPLY
0
Entering edit mode

Thanks! So I guess that means there is no solution using breakTies - although as you write in the edited post, that seems like an obvious functionality for that function to have (The current documentation for breakTies is also a bit unclear).

As was unaware of the global=TRUE argument to which.max/which.min - that's a neat little trick!

ADD REPLY
0
Entering edit mode

Well there is now in devel, S4Vectors 0.17.13.

I also wanted to mention that the sort() failed, I think, because IRanges wasn't loaded. Some stuff still needs to be moved over.

Jeff's comment on needing to include the queryHits is no longer true. Calling sort() on a "SortedByQueryHits" will coerce to an ordinary "Hits", unless the sort is actually by the query hits, in which case it does nothing.

ADD REPLY

Login before adding your answer.

Traffic: 657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6