Hi,
I added findMatches() and countMatches() to the latest IRanges /
GenomicRanges packages (in BioC devel only).
findMatches(x, table): An enhanced version of ?match? that
returns all the matches in a Hits object.
countMatches(x, table): Returns an integer vector of the length
of ?x?, containing the number of matches in ?table? for
each element in ?x?.
countMatches() is what you can use to tally/count/tabulate (choose
your
preferred term) the unique elements in a GRanges object:
library(GenomicRanges)
set.seed(33)
gr <- GRanges("chr1", IRanges(sample(15,20,replace=TRUE), width=5))
Then:
> gr_levels <- sort(unique(gr))
> countMatches(gr_levels, gr)
[1] 1 1 1 2 4 2 2 1 2 2 2
Note that findMatches() and countMatches() also work on IRanges and
DNAStringSet objects, as well as on ordinary atomic vectors:
library(hgu95av2probe)
library(Biostrings)
probes <- DNAStringSet(hgu95av2probe)
unique_probes <- unique(probes)
count <- countMatches(unique_probes, probes)
max(count) # 7
I made other changes in IRanges/GenomicRanges so that the notion
of "match" between elements of a vector-like object now consistently
means "equality" instead of "overlap", even for range-based objects
like IRanges or GRanges objects. This notion of "equality" is the
same that is used by ==. The most visible consequence of those
changes is that using %in% between 2 IRanges or GRanges objects
'query' and 'subject' in order to do overlaps was replaced by
overlapsAny(query, subject).
overlapsAny(query, subject): Finds the ranges in ?query? that
overlap any of the ranges in ?subject?.
There are warnings and deprecation messages in place to help smooth
the transition.
Cheers,
H.
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
The change to the behavior of %in% is a pretty big one. Are you
thinking
that all set-based operations should behave this way? For example,
setdiff
and intersect? I really liked the syntax of "peaks %in% genes". In my
experience, it's way more common to ask questions about overlap than
about
equality, so I'd rather optimize the API for that use case. But again,
that's just my personal bias.
Michael
On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès <hpages@fhcrc.org> wrote:
> Hi,
>
> I added findMatches() and countMatches() to the latest IRanges /
> GenomicRanges packages (in BioC devel only).
>
> findMatches(x, table): An enhanced version of match that
> returns all the matches in a Hits object.
>
> countMatches(x, table): Returns an integer vector of the length
> of x, containing the number of matches in table for
> each element in x.
>
> countMatches() is what you can use to tally/count/tabulate (choose
your
> preferred term) the unique elements in a GRanges object:
>
> library(GenomicRanges)
> set.seed(33)
> gr <- GRanges("chr1", IRanges(sample(15,20,replace=**TRUE),
width=5))
>
> Then:
>
> > gr_levels <- sort(unique(gr))
> > countMatches(gr_levels, gr)
> [1] 1 1 1 2 4 2 2 1 2 2 2
>
> Note that findMatches() and countMatches() also work on IRanges and
> DNAStringSet objects, as well as on ordinary atomic vectors:
>
> library(hgu95av2probe)
> library(Biostrings)
> probes <- DNAStringSet(hgu95av2probe)
> unique_probes <- unique(probes)
> count <- countMatches(unique_probes, probes)
> max(count) # 7
>
> I made other changes in IRanges/GenomicRanges so that the notion
> of "match" between elements of a vector-like object now consistently
> means "equality" instead of "overlap", even for range-based objects
> like IRanges or GRanges objects. This notion of "equality" is the
> same that is used by ==. The most visible consequence of those
> changes is that using %in% between 2 IRanges or GRanges objects
> 'query' and 'subject' in order to do overlaps was replaced by
> overlapsAny(query, subject).
>
> overlapsAny(query, subject): Finds the ranges in query that
> overlap any of the ranges in subject.
>
> There are warnings and deprecation messages in place to help smooth
> the transition.
>
> Cheers,
> H.
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages@fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
[[alternative HTML version deleted]]
On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
<lawrence.michael at="" gene.com=""> wrote:
> The change to the behavior of %in% is a pretty big one. Are you
thinking
> that all set-based operations should behave this way? For example,
setdiff
> and intersect? I really liked the syntax of "peaks %in% genes". In
my
> experience, it's way more common to ask questions about overlap than
about
> equality, so I'd rather optimize the API for that use case. But
again,
> that's just my personal bias.
For what it is worth, I share Michael's personal bias here.
Sean
> Michael
>
>
> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès <hpages at="" fhcrc.org="">
wrote:
>
>> Hi,
>>
>> I added findMatches() and countMatches() to the latest IRanges /
>> GenomicRanges packages (in BioC devel only).
>>
>> findMatches(x, table): An enhanced version of ?match? that
>> returns all the matches in a Hits object.
>>
>> countMatches(x, table): Returns an integer vector of the length
>> of ?x?, containing the number of matches in ?table? for
>> each element in ?x?.
>>
>> countMatches() is what you can use to tally/count/tabulate (choose
your
>> preferred term) the unique elements in a GRanges object:
>>
>> library(GenomicRanges)
>> set.seed(33)
>> gr <- GRanges("chr1", IRanges(sample(15,20,replace=**TRUE),
width=5))
>>
>> Then:
>>
>> > gr_levels <- sort(unique(gr))
>> > countMatches(gr_levels, gr)
>> [1] 1 1 1 2 4 2 2 1 2 2 2
>>
>> Note that findMatches() and countMatches() also work on IRanges and
>> DNAStringSet objects, as well as on ordinary atomic vectors:
>>
>> library(hgu95av2probe)
>> library(Biostrings)
>> probes <- DNAStringSet(hgu95av2probe)
>> unique_probes <- unique(probes)
>> count <- countMatches(unique_probes, probes)
>> max(count) # 7
>>
>> I made other changes in IRanges/GenomicRanges so that the notion
>> of "match" between elements of a vector-like object now
consistently
>> means "equality" instead of "overlap", even for range-based objects
>> like IRanges or GRanges objects. This notion of "equality" is the
>> same that is used by ==. The most visible consequence of those
>> changes is that using %in% between 2 IRanges or GRanges objects
>> 'query' and 'subject' in order to do overlaps was replaced by
>> overlapsAny(query, subject).
>>
>> overlapsAny(query, subject): Finds the ranges in ?query? that
>> overlap any of the ranges in ?subject?.
>>
>> There are warnings and deprecation messages in place to help smooth
>> the transition.
>>
>> Cheers,
>> H.
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
To address Sean and Michael's points, I wonder if
queryGR %in% subjectGR
could just mean, quite literally, the comparison
findOverlaps(queryGR, subjectGR, type='within')
and then to make things explicit, perhaps the operators
queryGR %within% subjectGR
queryGR %overlaps% subjectGR
queryGR %equals% subjectGR
could be introduced for readability? This would be good programming
hygiene anyways, as it removes some ambiguity for new users.
I routinely use %d%, %i%, %u% as shorthand,
for the binary operations setdiff(x, y), intersect(x, y), and union(x,
y),
at least when doing such operations in base R. Wouldn't break my
heart to
add operators for explicitly doing comparisons of GRs either
On Fri, Jan 4, 2013 at 1:37 PM, Sean Davis <sdavis2@mail.nih.gov>
wrote:
> On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
> <lawrence.michael@gene.com> wrote:
> > The change to the behavior of %in% is a pretty big one. Are you
thinking
> > that all set-based operations should behave this way? For example,
> setdiff
> > and intersect? I really liked the syntax of "peaks %in% genes". In
my
> > experience, it's way more common to ask questions about overlap
than
> about
> > equality, so I'd rather optimize the API for that use case. But
again,
> > that's just my personal bias.
>
> For what it is worth, I share Michael's personal bias here.
>
> Sean
>
>
> > Michael
> >
> >
> > On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès <hpages@fhcrc.org>
wrote:
> >
> >> Hi,
> >>
> >> I added findMatches() and countMatches() to the latest IRanges /
> >> GenomicRanges packages (in BioC devel only).
> >>
> >> findMatches(x, table): An enhanced version of match that
> >> returns all the matches in a Hits object.
> >>
> >> countMatches(x, table): Returns an integer vector of the length
> >> of x, containing the number of matches in table for
> >> each element in x.
> >>
> >> countMatches() is what you can use to tally/count/tabulate
(choose your
> >> preferred term) the unique elements in a GRanges object:
> >>
> >> library(GenomicRanges)
> >> set.seed(33)
> >> gr <- GRanges("chr1", IRanges(sample(15,20,replace=**TRUE),
width=5))
> >>
> >> Then:
> >>
> >> > gr_levels <- sort(unique(gr))
> >> > countMatches(gr_levels, gr)
> >> [1] 1 1 1 2 4 2 2 1 2 2 2
> >>
> >> Note that findMatches() and countMatches() also work on IRanges
and
> >> DNAStringSet objects, as well as on ordinary atomic vectors:
> >>
> >> library(hgu95av2probe)
> >> library(Biostrings)
> >> probes <- DNAStringSet(hgu95av2probe)
> >> unique_probes <- unique(probes)
> >> count <- countMatches(unique_probes, probes)
> >> max(count) # 7
> >>
> >> I made other changes in IRanges/GenomicRanges so that the notion
> >> of "match" between elements of a vector-like object now
consistently
> >> means "equality" instead of "overlap", even for range-based
objects
> >> like IRanges or GRanges objects. This notion of "equality" is the
> >> same that is used by ==. The most visible consequence of those
> >> changes is that using %in% between 2 IRanges or GRanges objects
> >> 'query' and 'subject' in order to do overlaps was replaced by
> >> overlapsAny(query, subject).
> >>
> >> overlapsAny(query, subject): Finds the ranges in query that
> >> overlap any of the ranges in subject.
> >>
> >> There are warnings and deprecation messages in place to help
smooth
> >> the transition.
> >>
> >> Cheers,
> >> H.
> >>
> >> --
> >> Hervé Pagès
> >>
> >> Program in Computational Biology
> >> Division of Public Health Sciences
> >> Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Ave. N, M1-B514
> >> P.O. Box 19024
> >> Seattle, WA 98109-1024
> >>
> >> E-mail: hpages@fhcrc.org
> >> Phone: (206) 667-5791
> >> Fax: (206) 667-1319
> >>
> >
> > [[alternative HTML version deleted]]
> >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
More explicitly, I note that:
R> selectMethod('%in%', c('GenomicRanges','GenomicRanges'))
Method Definition:
function (x, table)
{
warning(IRanges:::`%in%.warning.msg`("GenomicRanges"))
!is.na(match(x, table, match.if.overlap = FALSE))
}
<environment: namespace:genomicranges="">
is certainly explicit... that said, what I am thinking of, in MM
parlance,
is
identical( x %within% table, countOverlaps(x, table, type='within') >
0 )
== TRUE
identical( x %overlaps% table, countOverlaps(x, table, type='any') > 0
) ==
TRUE
identical( x %equals% table, countOverlaps(x, table, type='equal') > 0
) ==
TRUE
Perhaps the latter would be better written as x %identical% table or x
%isElementOf% table or some such?
Anyways. Just some thoughts. It can be a bit nebulous what,
precisely, is
being tabulated when one first starts using Ranges for comparisons IMO
On Fri, Jan 4, 2013 at 1:44 PM, Tim Triche, Jr.
<tim.triche@gmail.com>wrote:
> To address Sean and Michael's points, I wonder if
>
> queryGR %in% subjectGR
>
> could just mean, quite literally, the comparison
>
> findOverlaps(queryGR, subjectGR, type='within')
>
> and then to make things explicit, perhaps the operators
>
> queryGR %within% subjectGR
> queryGR %overlaps% subjectGR
> queryGR %equals% subjectGR
>
> could be introduced for readability? This would be good programming
> hygiene anyways, as it removes some ambiguity for new users.
>
>
> I routinely use %d%, %i%, %u% as shorthand,
> for the binary operations setdiff(x, y), intersect(x, y), and
union(x, y),
> at least when doing such operations in base R. Wouldn't break my
heart to
> add operators for explicitly doing comparisons of GRs either
>
>
>
>
>
>
> On Fri, Jan 4, 2013 at 1:37 PM, Sean Davis <sdavis2@mail.nih.gov>
wrote:
>
>> On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
>> <lawrence.michael@gene.com> wrote:
>> > The change to the behavior of %in% is a pretty big one. Are you
thinking
>> > that all set-based operations should behave this way? For
example,
>> setdiff
>> > and intersect? I really liked the syntax of "peaks %in% genes".
In my
>> > experience, it's way more common to ask questions about overlap
than
>> about
>> > equality, so I'd rather optimize the API for that use case. But
again,
>> > that's just my personal bias.
>>
>> For what it is worth, I share Michael's personal bias here.
>>
>> Sean
>>
>>
>> > Michael
>> >
>> >
>> > On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès <hpages@fhcrc.org>
wrote:
>> >
>> >> Hi,
>> >>
>> >> I added findMatches() and countMatches() to the latest IRanges /
>> >> GenomicRanges packages (in BioC devel only).
>> >>
>> >> findMatches(x, table): An enhanced version of match that
>> >> returns all the matches in a Hits object.
>> >>
>> >> countMatches(x, table): Returns an integer vector of the
length
>> >> of x, containing the number of matches in table
for
>> >> each element in x.
>> >>
>> >> countMatches() is what you can use to tally/count/tabulate
(choose your
>> >> preferred term) the unique elements in a GRanges object:
>> >>
>> >> library(GenomicRanges)
>> >> set.seed(33)
>> >> gr <- GRanges("chr1", IRanges(sample(15,20,replace=**TRUE),
width=5))
>> >>
>> >> Then:
>> >>
>> >> > gr_levels <- sort(unique(gr))
>> >> > countMatches(gr_levels, gr)
>> >> [1] 1 1 1 2 4 2 2 1 2 2 2
>> >>
>> >> Note that findMatches() and countMatches() also work on IRanges
and
>> >> DNAStringSet objects, as well as on ordinary atomic vectors:
>> >>
>> >> library(hgu95av2probe)
>> >> library(Biostrings)
>> >> probes <- DNAStringSet(hgu95av2probe)
>> >> unique_probes <- unique(probes)
>> >> count <- countMatches(unique_probes, probes)
>> >> max(count) # 7
>> >>
>> >> I made other changes in IRanges/GenomicRanges so that the notion
>> >> of "match" between elements of a vector-like object now
consistently
>> >> means "equality" instead of "overlap", even for range-based
objects
>> >> like IRanges or GRanges objects. This notion of "equality" is
the
>> >> same that is used by ==. The most visible consequence of those
>> >> changes is that using %in% between 2 IRanges or GRanges objects
>> >> 'query' and 'subject' in order to do overlaps was replaced by
>> >> overlapsAny(query, subject).
>> >>
>> >> overlapsAny(query, subject): Finds the ranges in query that
>> >> overlap any of the ranges in subject.
>> >>
>> >> There are warnings and deprecation messages in place to help
smooth
>> >> the transition.
>> >>
>> >> Cheers,
>> >> H.
>> >>
>> >> --
>> >> Hervé Pagès
>> >>
>> >> Program in Computational Biology
>> >> Division of Public Health Sciences
>> >> Fred Hutchinson Cancer Research Center
>> >> 1100 Fairview Ave. N, M1-B514
>> >> P.O. Box 19024
>> >> Seattle, WA 98109-1024
>> >>
>> >> E-mail: hpages@fhcrc.org
>> >> Phone: (206) 667-5791
>> >> Fax: (206) 667-1319
>> >>
>> >
>> > [[alternative HTML version deleted]]
>> >
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor@r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> *A model is a lie that helps you see the truth.*
> *
> *
> Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
Hiya,
For what it is worth...
I think the change to %in% is warranted.
If I understand correctly, this change restores the relationship
between the semantics of `%in` and the semantics of `match`.
On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm <mec@stowers.org> wrote:
> Hiya,
>
> For what it is worth...
>
> I think the change to %in% is warranted.
>
> If I understand correctly, this change restores the relationship
between
> the semantics of `%in` and the semantics of `match`.
>
> From the docs:
>
> '"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0'
>
> Herve's change restores this relationship.
>
>
match and %in% were initially consistent (both considering any
overlap);
Herve has changed both of them together. The whole idea behind IRanges
is
that ranges are special data types with special semantics. We have
reimplemented much of the existing R vector API using those semantics;
this
extends beyond match/%in%. I am hesitant about making such sweeping
changes
to the API so late in the life-cycle of the package. There was a
feature
request for a way to count identical ranges in a set of ranges. Let's
please not get carried away and start redesigning the API for this
one,
albeit useful, request. There are all sorts of inconsistencies in the
API,
and many of them were conscious decisions that considered practical
use
cases.
Michael
Herve, I suspect you were you as a result able to completely drop all
the
> `%in%,BiocClass1,BiocClass2` definitions and depend upon base::%in%
>
> Am I right?
>
> If so, may I suggest that Herve stay the course, with the addition
of
> '"%ol%" <- function(a, b) findOverlaps(a, b, maxgap=0L,
minoverlap=1L,
> type='any', select='all') > 0'
>
> This would provide a perspicacious idiom, thereby optimizing the API
for
> Michaels observed common use case.
>
> Just sayin'
>
> ~Malcolm
>
>
> .-----Original Message-----
> .From: bioconductor-bounces@r-project.org [mailto:
> bioconductor-bounces@r-project.org] On Behalf Of Sean Davis
> .Sent: Friday, January 04, 2013 3:37 PM
> .To: Michael Lawrence
> .Cc: Tim Triche, Jr.; Vedran Franke; bioconductor@r-project.org
> .Subject: Re: [BioC] countMatches() (was: table for GenomicRanges)
> .
> .On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
> .<lawrence.michael@gene.com> wrote:
> .> The change to the behavior of %in% is a pretty big one. Are you
> thinking
> .> that all set-based operations should behave this way? For
example,
> setdiff
> .> and intersect? I really liked the syntax of "peaks %in% genes".
In my
> .> experience, it's way more common to ask questions about overlap
than
> about
> .> equality, so I'd rather optimize the API for that use case. But
again,
> .> that's just my personal bias.
> .
> .For what it is worth, I share Michael's personal bias here.
> .
> .Sean
> .
> .
> .> Michael
> .>
> .>
> .> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès <hpages@fhcrc.org>
wrote:
> .>
> .>> Hi,
> .>>
> .>> I added findMatches() and countMatches() to the latest IRanges
/
> .>> GenomicRanges packages (in BioC devel only).
> .>>
> .>> findMatches(x, table): An enhanced version of match that
> .>> returns all the matches in a Hits object.
> .>>
> .>> countMatches(x, table): Returns an integer vector of the
length
> .>> of x, containing the number of matches in table
for
> .>> each element in x.
> .>>
> .>> countMatches() is what you can use to tally/count/tabulate
(choose
> your
> .>> preferred term) the unique elements in a GRanges object:
> .>>
> .>> library(GenomicRanges)
> .>> set.seed(33)
> .>> gr <- GRanges("chr1", IRanges(sample(15,20,replace=**TRUE),
> width=5))
> .>>
> .>> Then:
> .>>
> .>> > gr_levels <- sort(unique(gr))
> .>> > countMatches(gr_levels, gr)
> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
> .>>
> .>> Note that findMatches() and countMatches() also work on IRanges
and
> .>> DNAStringSet objects, as well as on ordinary atomic vectors:
> .>>
> .>> library(hgu95av2probe)
> .>> library(Biostrings)
> .>> probes <- DNAStringSet(hgu95av2probe)
> .>> unique_probes <- unique(probes)
> .>> count <- countMatches(unique_probes, probes)
> .>> max(count) # 7
> .>>
> .>> I made other changes in IRanges/GenomicRanges so that the
notion
> .>> of "match" between elements of a vector-like object now
consistently
> .>> means "equality" instead of "overlap", even for range-based
objects
> .>> like IRanges or GRanges objects. This notion of "equality" is
the
> .>> same that is used by ==. The most visible consequence of those
> .>> changes is that using %in% between 2 IRanges or GRanges objects
> .>> 'query' and 'subject' in order to do overlaps was replaced by
> .>> overlapsAny(query, subject).
> .>>
> .>> overlapsAny(query, subject): Finds the ranges in query that
> .>> overlap any of the ranges in subject.
> .>>
> .>> There are warnings and deprecation messages in place to help
smooth
> .>> the transition.
> .>>
> .>> Cheers,
> .>> H.
> .>>
> .>> --
> .>> Hervé Pagès
> .>>
> .>> Program in Computational Biology
> .>> Division of Public Health Sciences
> .>> Fred Hutchinson Cancer Research Center
> .>> 1100 Fairview Ave. N, M1-B514
> .>> P.O. Box 19024
> .>> Seattle, WA 98109-1024
> .>>
> .>> E-mail: hpages@fhcrc.org
> .>> Phone: (206) 667-5791
> .>> Fax: (206) 667-1319
> .>>
> .>
> .> [[alternative HTML version deleted]]
> .>
> .>
> .> _______________________________________________
> .> Bioconductor mailing list
> .> Bioconductor@r-project.org
> .> https://stat.ethz.ch/mailman/listinfo/bioconductor
> .> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> .
> ._______________________________________________
> .Bioconductor mailing list
> .Bioconductor@r-project.org
> .https://stat.ethz.ch/mailman/listinfo/bioconductor
> .Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
So why not leave %in% the same as it was, revert to
setMethod('%in%', c('GenomicRanges','GenomicRanges'), function (x,
table)
{
warning(IRanges:::`%in%.warning.msg`("GenomicRanges"))
!is.na(match(x, table, match.if.overlap = TRUE))
})
and introduce the explicit %within%, %overlaps%, %equals% generic
operators
for clarity?
Should avoid the massive churn to the API while still allowing people
to
tabulate things cleanly, no?
On Fri, Jan 4, 2013 at 3:10 PM, Michael Lawrence
<lawrence.michael@gene.com>wrote:
>
>
>
> On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm <mec@stowers.org>
wrote:
>
>> Hiya,
>>
>> For what it is worth...
>>
>> I think the change to %in% is warranted.
>>
>> If I understand correctly, this change restores the relationship
between
>> the semantics of `%in` and the semantics of `match`.
>>
>> From the docs:
>>
>> '"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0'
>>
>> Herve's change restores this relationship.
>>
>>
> match and %in% were initially consistent (both considering any
overlap);
> Herve has changed both of them together. The whole idea behind
IRanges is
> that ranges are special data types with special semantics. We have
> reimplemented much of the existing R vector API using those
semantics; this
> extends beyond match/%in%. I am hesitant about making such sweeping
changes
> to the API so late in the life-cycle of the package. There was a
feature
> request for a way to count identical ranges in a set of ranges.
Let's
> please not get carried away and start redesigning the API for this
one,
> albeit useful, request. There are all sorts of inconsistencies in
the API,
> and many of them were conscious decisions that considered practical
use
> cases.
>
> Michael
>
>
>
> Herve, I suspect you were you as a result able to completely drop
all the
>> `%in%,BiocClass1,BiocClass2` definitions and depend upon base::%in%
>>
>> Am I right?
>>
>> If so, may I suggest that Herve stay the course, with the addition
of
>> '"%ol%" <- function(a, b) findOverlaps(a, b, maxgap=0L,
minoverlap=1L,
>> type='any', select='all') > 0'
>>
>> This would provide a perspicacious idiom, thereby optimizing the
API for
>> Michaels observed common use case.
>>
>> Just sayin'
>>
>> ~Malcolm
>>
>>
>> .-----Original Message-----
>> .From: bioconductor-bounces@r-project.org [mailto:
>> bioconductor-bounces@r-project.org] On Behalf Of Sean Davis
>> .Sent: Friday, January 04, 2013 3:37 PM
>> .To: Michael Lawrence
>> .Cc: Tim Triche, Jr.; Vedran Franke; bioconductor@r-project.org
>> .Subject: Re: [BioC] countMatches() (was: table for GenomicRanges)
>> .
>> .On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
>> .<lawrence.michael@gene.com> wrote:
>> .> The change to the behavior of %in% is a pretty big one. Are you
>> thinking
>> .> that all set-based operations should behave this way? For
example,
>> setdiff
>> .> and intersect? I really liked the syntax of "peaks %in% genes".
In my
>> .> experience, it's way more common to ask questions about overlap
than
>> about
>> .> equality, so I'd rather optimize the API for that use case. But
again,
>> .> that's just my personal bias.
>> .
>> .For what it is worth, I share Michael's personal bias here.
>> .
>> .Sean
>> .
>> .
>> .> Michael
>> .>
>> .>
>> .> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès <hpages@fhcrc.org>
wrote:
>> .>
>> .>> Hi,
>> .>>
>> .>> I added findMatches() and countMatches() to the latest IRanges
/
>> .>> GenomicRanges packages (in BioC devel only).
>> .>>
>> .>> findMatches(x, table): An enhanced version of match that
>> .>> returns all the matches in a Hits object.
>> .>>
>> .>> countMatches(x, table): Returns an integer vector of the
length
>> .>> of x, containing the number of matches in table
for
>> .>> each element in x.
>> .>>
>> .>> countMatches() is what you can use to tally/count/tabulate
(choose
>> your
>> .>> preferred term) the unique elements in a GRanges object:
>> .>>
>> .>> library(GenomicRanges)
>> .>> set.seed(33)
>> .>> gr <- GRanges("chr1", IRanges(sample(15,20,replace=**TRUE),
>> width=5))
>> .>>
>> .>> Then:
>> .>>
>> .>> > gr_levels <- sort(unique(gr))
>> .>> > countMatches(gr_levels, gr)
>> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
>> .>>
>> .>> Note that findMatches() and countMatches() also work on
IRanges and
>> .>> DNAStringSet objects, as well as on ordinary atomic vectors:
>> .>>
>> .>> library(hgu95av2probe)
>> .>> library(Biostrings)
>> .>> probes <- DNAStringSet(hgu95av2probe)
>> .>> unique_probes <- unique(probes)
>> .>> count <- countMatches(unique_probes, probes)
>> .>> max(count) # 7
>> .>>
>> .>> I made other changes in IRanges/GenomicRanges so that the
notion
>> .>> of "match" between elements of a vector-like object now
consistently
>> .>> means "equality" instead of "overlap", even for range-based
objects
>> .>> like IRanges or GRanges objects. This notion of "equality" is
the
>> .>> same that is used by ==. The most visible consequence of those
>> .>> changes is that using %in% between 2 IRanges or GRanges
objects
>> .>> 'query' and 'subject' in order to do overlaps was replaced by
>> .>> overlapsAny(query, subject).
>> .>>
>> .>> overlapsAny(query, subject): Finds the ranges in query
that
>> .>> overlap any of the ranges in subject.
>> .>>
>> .>> There are warnings and deprecation messages in place to help
smooth
>> .>> the transition.
>> .>>
>> .>> Cheers,
>> .>> H.
>> .>>
>> .>> --
>> .>> Hervé Pagès
>> .>>
>> .>> Program in Computational Biology
>> .>> Division of Public Health Sciences
>> .>> Fred Hutchinson Cancer Research Center
>> .>> 1100 Fairview Ave. N, M1-B514
>> .>> P.O. Box 19024
>> .>> Seattle, WA 98109-1024
>> .>>
>> .>> E-mail: hpages@fhcrc.org
>> .>> Phone: (206) 667-5791
>> .>> Fax: (206) 667-1319
>> .>>
>> .>
>> .> [[alternative HTML version deleted]]
>> .>
>> .>
>> .> _______________________________________________
>> .> Bioconductor mailing list
>> .> Bioconductor@r-project.org
>> .> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> .> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> .
>> ._______________________________________________
>> .Bioconductor mailing list
>> .Bioconductor@r-project.org
>> .https://stat.ethz.ch/mailman/listinfo/bioconductor
>> .Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
Hiya again,
I am definitely a late comer to BioC, so I definitely easily defer to
the tide of history.
But I do think you miss my point Michael about the proposed change
making the relationship between %in% and match for {G,I}Ranges{List}
mimic that between other vectors, and I do think that changing the API
would make other late-comers take to BioC easier/faster.
That said, I NEVER use %in% so I really have no stake in the matter,
and I DEFINITELY appreciate the argument to not changing the API just
for sematic sweetness.
That that said, Herve is _so good_ about deprecations and warnings
that make such changes fairly easily digestible.
That that that.... enough.... I bow out of this one....!!!!
Always learning and Happy New Year to all lurkers,
~Malcolm
From: Michael Lawrence [mailto:lawrence.michael@gene.com]
Sent: Friday, January 04, 2013 5:11 PM
To: Cook, Malcolm
Cc: Sean Davis; Michael Lawrence; Hervé Pagès (hpages@fhcrc.org); Tim
Triche, Jr.; Vedran Franke; bioconductor@r-project.org
Subject: Re: [BioC] countMatches() (was: table for GenomicRanges)
On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm
<mec@stowers.org<mailto:mec@stowers.org>> wrote:
Hiya,
For what it is worth...
I think the change to %in% is warranted.
If I understand correctly, this change restores the relationship
between the semantics of `%in` and the semantics of `match`.
>From the docs:
'"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0'
Herve's change restores this relationship.
match and %in% were initially consistent (both considering any
overlap); Herve has changed both of them together. The whole idea
behind IRanges is that ranges are special data types with special
semantics. We have reimplemented much of the existing R vector API
using those semantics; this extends beyond match/%in%. I am hesitant
about making such sweeping changes to the API so late in the life-
cycle of the package. There was a feature request for a way to count
identical ranges in a set of ranges. Let's please not get carried away
and start redesigning the API for this one, albeit useful, request.
There are all sorts of inconsistencies in the API, and many of them
were conscious decisions that considered practical use cases.
Michael
Herve, I suspect you were you as a result able to completely drop all
the `%in%,BiocClass1,BiocClass2` definitions and depend upon
base::%in%
Am I right?
If so, may I suggest that Herve stay the course, with the addition of
'"%ol%" <- function(a, b) findOverlaps(a, b, maxgap=0L,
minoverlap=1L, type='any', select='all') > 0'
This would provide a perspicacious idiom, thereby optimizing the API
for Michaels observed common use case.
Just sayin'
~Malcolm
.-----Original Message-----
.From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""> [mailto:bioconductor-
bounces@r-project.org<mailto:bioconductor-bounces@r-project.org>] On
Behalf Of Sean Davis
.Sent: Friday, January 04, 2013 3:37 PM
.To: Michael Lawrence
.Cc: Tim Triche, Jr.; Vedran Franke;
bioconductor@r-project.org<mailto:bioconductor@r-project.org>
.Subject: Re: [BioC] countMatches() (was: table for GenomicRanges)
.
.On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
.<lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>> wrote:
.> The change to the behavior of %in% is a pretty big one. Are you
thinking
.> that all set-based operations should behave this way? For example,
setdiff
.> and intersect? I really liked the syntax of "peaks %in% genes". In
my
.> experience, it's way more common to ask questions about overlap
than about
.> equality, so I'd rather optimize the API for that use case. But
again,
.> that's just my personal bias.
.
.For what it is worth, I share Michael's personal bias here.
.
.Sean
.
.
.> Michael
.>
.>
.> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès
<hpages@fhcrc.org<mailto:hpages@fhcrc.org>> wrote:
.>
.>> Hi,
.>>
.>> I added findMatches() and countMatches() to the latest IRanges /
.>> GenomicRanges packages (in BioC devel only).
.>>
.>> findMatches(x, table): An enhanced version of 'match' that
.>> returns all the matches in a Hits object.
.>>
.>> countMatches(x, table): Returns an integer vector of the length
.>> of 'x', containing the number of matches in 'table' for
.>> each element in 'x'.
.>>
.>> countMatches() is what you can use to tally/count/tabulate
(choose your
.>> preferred term) the unique elements in a GRanges object:
.>>
.>> library(GenomicRanges)
.>> set.seed(33)
.>> gr <- GRanges("chr1", IRanges(sample(15,20,replace=**TRUE),
width=5))
.>>
.>> Then:
.>>
.>> > gr_levels <- sort(unique(gr))
.>> > countMatches(gr_levels, gr)
.>> [1] 1 1 1 2 4 2 2 1 2 2 2
.>>
.>> Note that findMatches() and countMatches() also work on IRanges
and
.>> DNAStringSet objects, as well as on ordinary atomic vectors:
.>>
.>> library(hgu95av2probe)
.>> library(Biostrings)
.>> probes <- DNAStringSet(hgu95av2probe)
.>> unique_probes <- unique(probes)
.>> count <- countMatches(unique_probes, probes)
.>> max(count) # 7
.>>
.>> I made other changes in IRanges/GenomicRanges so that the notion
.>> of "match" between elements of a vector-like object now
consistently
.>> means "equality" instead of "overlap", even for range-based
objects
.>> like IRanges or GRanges objects. This notion of "equality" is the
.>> same that is used by ==. The most visible consequence of those
.>> changes is that using %in% between 2 IRanges or GRanges objects
.>> 'query' and 'subject' in order to do overlaps was replaced by
.>> overlapsAny(query, subject).
.>>
.>> overlapsAny(query, subject): Finds the ranges in 'query' that
.>> overlap any of the ranges in 'subject'.
.>>
.>> There are warnings and deprecation messages in place to help
smooth
.>> the transition.
.>>
.>> Cheers,
.>> H.
.>>
.>> --
.>> Hervé Pagès
.>>
.>> Program in Computational Biology
.>> Division of Public Health Sciences
.>> Fred Hutchinson Cancer Research Center
.>> 1100 Fairview Ave. N, M1-B514
.>> P.O. Box 19024
.>> Seattle, WA 98109-1024
.>>
.>> E-mail: hpages@fhcrc.org<mailto:hpages@fhcrc.org>
.>> Phone: (206) 667-5791<tel:%28206%29%20667-5791>
.>> Fax: (206) 667-1319<tel:%28206%29%20667-1319>
.>>
.>
.> [[alternative HTML version deleted]]
.>
.>
.> _______________________________________________
.> Bioconductor mailing list
.> Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
.> https://stat.ethz.ch/mailman/listinfo/bioconductor
.> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
.
._______________________________________________
.Bioconductor mailing list
.Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
.https://stat.ethz.ch/mailman/listinfo/bioconductor
.Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]
Yes 'peaks %in% genes' is cute and was probably doing the right thing
for most users (although not all). But 'exons %in% genes' is cute too
and was probably doing the wrong thing for all users. Advanced users
like you guys would have no problem switching to
!is.na(findOverlaps(peaks, genes, type="within", select="any"))
or
!is.na(findOverlaps(peaks, genes, type="equal", select="any"))
in case 'peaks %in% genes' was not doing exactly what you wanted,
but most users would not find this particularly friendly. Even
worse, some users probably didn't realize that 'peaks %in% genes'
was not doing exactly what they thought it did because "peaks in
genes" in English suggests that the peaks are within the genes,
but it's not what 'peaks %in% genes' does.
Having overlapsAny(), with exactly the same extra arguments as
countOverlaps() and subsetByOverlaps() (i.e. 'maxgap', 'minoverlap',
'type', 'ignore.strand'), all of them documented (and with most
users more or less familiar with them already) has the virtue to
expose the user to all the options from the very start, and to
help him/her make the right choice. Of course there will be users
that don't want or don't have the time to read/think about all the
options. Not a big deal: they'll just do 'overlapsAny(query,
subject)',
which is not a lot more typing than 'query %in% subject', especially
if they use tab completion.
It's true that it's more common to ask questions about overlap than
about equality but there are some use cases for the latter (as the
original thread shows). Until now, when you had such a use case, you
could not use match() or %in%, which would have been the natural
things
to use, because they got hijacked to do something else, and you were
left with nothing. Not a satisfying situation. So at a minimum, we
needed to restore the true/real/original semantic of match() to do
"equality" instead of "overlap". But it's hard to do this for match()
and not do it for %in% too. For more than 99% of R users, %in% is
just a simple wrapper for 'match(x, table, nomatch = 0) > 0' (this
is how it has been documented and implemented in base R for many
years). Not maintaining this relationship between %in% and match()
would only cause grief and frustration to newcomers to Bioconductor.
H.
On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
> Hiya again,
>
> I am definitely a late comer to BioC, so I definitely easily defer
to
> the tide of history.
>
> But I do think you miss my point Michael about the proposed change
> making the relationship between %in% and match for {G,I}Ranges{List}
> mimic that between other vectors, and I do think that changing the
API
> would make other late-comers take to BioC easier/faster.
>
> That said, I NEVER use %in% so I really have no stake in the matter,
and
> I DEFINITELY appreciate the argument to not changing the API just
for
> sematic sweetness.
>
> That that said, Herve is _/so good/_ about deprecations and warnings
> that make such changes fairly easily digestible.
>
> That that that.... enough.... I bow out of this one....!!!!
>
> Always learning and Happy New Year to all lurkers,
>
> ~Malcolm
>
> *From:*Michael Lawrence [mailto:lawrence.michael at gene.com]
> *Sent:* Friday, January 04, 2013 5:11 PM
> *To:* Cook, Malcolm
> *Cc:* Sean Davis; Michael Lawrence; Hervé Pagès (hpages at
fhcrc.org); Tim
> Triche, Jr.; Vedran Franke; bioconductor at r-project.org
> *Subject:* Re: [BioC] countMatches() (was: table for GenomicRanges)
>
> On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm <mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">> wrote:
>
> Hiya,
>
> For what it is worth...
>
> I think the change to %in% is warranted.
>
> If I understand correctly, this change restores the relationship
between
> the semantics of `%in` and the semantics of `match`.
>
> From the docs:
>
> '"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0'
>
> Herve's change restores this relationship.
>
>
> match and %in% were initially consistent (both considering any
overlap);
> Herve has changed both of them together. The whole idea behind
IRanges
> is that ranges are special data types with special semantics. We
have
> reimplemented much of the existing R vector API using those
semantics;
> this extends beyond match/%in%. I am hesitant about making such
sweeping
> changes to the API so late in the life-cycle of the package. There
was a
> feature request for a way to count identical ranges in a set of
ranges.
> Let's please not get carried away and start redesigning the API for
this
> one, albeit useful, request. There are all sorts of inconsistencies
in
> the API, and many of them were conscious decisions that considered
> practical use cases.
>
> Michael
>
>
> Herve, I suspect you were you as a result able to completely
drop
> all the `%in%,BiocClass1,BiocClass2` definitions and depend upon
> base::%in%
>
> Am I right?
>
> If so, may I suggest that Herve stay the course, with the
addition of
> '"%ol%" <- function(a, b) findOverlaps(a, b, maxgap=0L,
> minoverlap=1L, type='any', select='all') > 0'
>
> This would provide a perspicacious idiom, thereby optimizing the
API
> for Michaels observed common use case.
>
> Just sayin'
>
> ~Malcolm
>
>
> .-----Original Message-----
> .From: bioconductor-bounces at r-project.org
> <mailto:bioconductor-bounces at="" r-project.org="">
> [mailto:bioconductor-bounces at r-project.org
> <mailto:bioconductor-bounces at="" r-project.org="">] On Behalf Of
Sean Davis
> .Sent: Friday, January 04, 2013 3:37 PM
> .To: Michael Lawrence
> .Cc: Tim Triche, Jr.; Vedran Franke; bioconductor at
r-project.org
> <mailto:bioconductor at="" r-project.org="">
> .Subject: Re: [BioC] countMatches() (was: table for
GenomicRanges)
> .
> .On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
> .<lawrence.michael at="" gene.com="" <mailto:lawrence.michael="" at="" gene.com="">> wrote:
> .> The change to the behavior of %in% is a pretty big one. Are
you
> thinking
> .> that all set-based operations should behave this way? For
> example, setdiff
> .> and intersect? I really liked the syntax of "peaks %in%
genes".
> In my
> .> experience, it's way more common to ask questions about
overlap
> than about
> .> equality, so I'd rather optimize the API for that use case.
But
> again,
> .> that's just my personal bias.
> .
> .For what it is worth, I share Michael's personal bias here.
> .
> .Sean
> .
> .
> .> Michael
> .>
> .>
> .> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote:
> .>
> .>> Hi,
> .>>
> .>> I added findMatches() and countMatches() to the latest
IRanges /
> .>> GenomicRanges packages (in BioC devel only).
> .>>
> .>> findMatches(x, table): An enhanced version of ?match?
that
> .>> returns all the matches in a Hits object.
> .>>
> .>> countMatches(x, table): Returns an integer vector of the
length
> .>> of ?x?, containing the number of matches in
?table? for
> .>> each element in ?x?.
> .>>
>
> .>> countMatches() is what you can use to tally/count/tabulate
> (choose your
>
> .>> preferred term) the unique elements in a GRanges object:
> .>>
> .>> library(GenomicRanges)
> .>> set.seed(33)
> .>> gr <- GRanges("chr1",
IRanges(sample(15,20,replace=**TRUE),
> width=5))
> .>>
> .>> Then:
> .>>
> .>> > gr_levels <- sort(unique(gr))
> .>> > countMatches(gr_levels, gr)
> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
> .>>
> .>> Note that findMatches() and countMatches() also work on
> IRanges and
> .>> DNAStringSet objects, as well as on ordinary atomic
vectors:
> .>>
> .>> library(hgu95av2probe)
> .>> library(Biostrings)
> .>> probes <- DNAStringSet(hgu95av2probe)
> .>> unique_probes <- unique(probes)
> .>> count <- countMatches(unique_probes, probes)
> .>> max(count) # 7
> .>>
> .>> I made other changes in IRanges/GenomicRanges so that the
notion
> .>> of "match" between elements of a vector-like object now
> consistently
> .>> means "equality" instead of "overlap", even for range-
based
> objects
> .>> like IRanges or GRanges objects. This notion of "equality"
is the
> .>> same that is used by ==. The most visible consequence of
those
> .>> changes is that using %in% between 2 IRanges or GRanges
objects
> .>> 'query' and 'subject' in order to do overlaps was replaced
by
> .>> overlapsAny(query, subject).
> .>>
> .>> overlapsAny(query, subject): Finds the ranges in ?query?
that
> .>> overlap any of the ranges in ?subject?.
> .>>
>
> .>> There are warnings and deprecation messages in place to
help
> smooth
>
> .>> the transition.
> .>>
> .>> Cheers,
> .>> H.
> .>>
> .>> --
> .>> Hervé Pagès
> .>>
> .>> Program in Computational Biology
> .>> Division of Public Health Sciences
> .>> Fred Hutchinson Cancer Research Center
> .>> 1100 Fairview Ave. N, M1-B514
> .>> P.O. Box 19024
> .>> Seattle, WA 98109-1024
> .>>
> .>> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> .>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> .>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> .>>
> .>
> .> [[alternative HTML version deleted]]
> .>
> .>
> .> _______________________________________________
> .> Bioconductor mailing list
> .> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> .> https://stat.ethz.ch/mailman/listinfo/bioconductor
> .> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> .
> ._______________________________________________
> .Bioconductor mailing list
> .Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> .https://stat.ethz.ch/mailman/listinfo/bioconductor
> .Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
I think having overlapsAny is a nice addition and helps make the API
more
complete and explicit. Are you sure we need to change the behavior of
the
match method for this relatively uncommon use case? I don't think
"match"
always has to mean "equality". It is a more general concept in my
mind. The
most common use case for matching ranges is overlap.
Michael
On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès <hpages@fhcrc.org> wrote:
> Yes 'peaks %in% genes' is cute and was probably doing the right
thing
> for most users (although not all). But 'exons %in% genes' is cute
too
> and was probably doing the wrong thing for all users. Advanced users
> like you guys would have no problem switching to
>
> !is.na(findOverlaps(peaks, genes, type="within", select="any"))
>
> or
>
> !is.na(findOverlaps(peaks, genes, type="equal", select="any"))
>
> in case 'peaks %in% genes' was not doing exactly what you wanted,
> but most users would not find this particularly friendly. Even
> worse, some users probably didn't realize that 'peaks %in% genes'
> was not doing exactly what they thought it did because "peaks in
> genes" in English suggests that the peaks are within the genes,
> but it's not what 'peaks %in% genes' does.
>
> Having overlapsAny(), with exactly the same extra arguments as
> countOverlaps() and subsetByOverlaps() (i.e. 'maxgap', 'minoverlap',
> 'type', 'ignore.strand'), all of them documented (and with most
> users more or less familiar with them already) has the virtue to
> expose the user to all the options from the very start, and to
> help him/her make the right choice. Of course there will be users
> that don't want or don't have the time to read/think about all the
> options. Not a big deal: they'll just do 'overlapsAny(query,
subject)',
> which is not a lot more typing than 'query %in% subject', especially
> if they use tab completion.
>
> It's true that it's more common to ask questions about overlap than
> about equality but there are some use cases for the latter (as the
> original thread shows). Until now, when you had such a use case, you
> could not use match() or %in%, which would have been the natural
things
> to use, because they got hijacked to do something else, and you were
> left with nothing. Not a satisfying situation. So at a minimum, we
> needed to restore the true/real/original semantic of match() to do
> "equality" instead of "overlap". But it's hard to do this for
match()
> and not do it for %in% too. For more than 99% of R users, %in% is
> just a simple wrapper for 'match(x, table, nomatch = 0) > 0' (this
> is how it has been documented and implemented in base R for many
> years). Not maintaining this relationship between %in% and match()
> would only cause grief and frustration to newcomers to Bioconductor.
>
> H.
>
>
>
> On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
>
>> Hiya again,
>>
>> I am definitely a late comer to BioC, so I definitely easily defer
to
>> the tide of history.
>>
>> But I do think you miss my point Michael about the proposed change
>> making the relationship between %in% and match for
{G,I}Ranges{List}
>> mimic that between other vectors, and I do think that changing the
API
>> would make other late-comers take to BioC easier/faster.
>>
>> That said, I NEVER use %in% so I really have no stake in the
matter, and
>> I DEFINITELY appreciate the argument to not changing the API just
for
>> sematic sweetness.
>>
>> That that said, Herve is _/so good/_ about deprecations and
warnings
>>
>> that make such changes fairly easily digestible.
>>
>> That that that.... enough.... I bow out of this one....!!!!
>>
>> Always learning and Happy New Year to all lurkers,
>>
>> ~Malcolm
>>
>> *From:*Michael Lawrence
[mailto:lawrence.michael@gene.**com<lawrence.michael@gene.com>
>> ]
>> *Sent:* Friday, January 04, 2013 5:11 PM
>> *To:* Cook, Malcolm
>> *Cc:* Sean Davis; Michael Lawrence; Hervé Pagès (hpages@fhcrc.org);
Tim
>>
>> Triche, Jr.; Vedran Franke; bioconductor@r-project.org
>> *Subject:* Re: [BioC] countMatches() (was: table for GenomicRanges)
>>
>>
>> On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm <mec@stowers.org>> <mailto:mec@stowers.org>> wrote:
>>
>> Hiya,
>>
>> For what it is worth...
>>
>> I think the change to %in% is warranted.
>>
>> If I understand correctly, this change restores the relationship
between
>> the semantics of `%in` and the semantics of `match`.
>>
>> From the docs:
>>
>> '"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0'
>>
>> Herve's change restores this relationship.
>>
>>
>> match and %in% were initially consistent (both considering any
overlap);
>> Herve has changed both of them together. The whole idea behind
IRanges
>> is that ranges are special data types with special semantics. We
have
>> reimplemented much of the existing R vector API using those
semantics;
>> this extends beyond match/%in%. I am hesitant about making such
sweeping
>> changes to the API so late in the life-cycle of the package. There
was a
>> feature request for a way to count identical ranges in a set of
ranges.
>> Let's please not get carried away and start redesigning the API for
this
>> one, albeit useful, request. There are all sorts of inconsistencies
in
>> the API, and many of them were conscious decisions that considered
>> practical use cases.
>>
>> Michael
>>
>>
>> Herve, I suspect you were you as a result able to completely
drop
>> all the `%in%,BiocClass1,BiocClass2` definitions and depend
upon
>> base::%in%
>>
>> Am I right?
>>
>> If so, may I suggest that Herve stay the course, with the
addition of
>> '"%ol%" <- function(a, b) findOverlaps(a, b, maxgap=0L,
>> minoverlap=1L, type='any', select='all') > 0'
>>
>> This would provide a perspicacious idiom, thereby optimizing
the API
>> for Michaels observed common use case.
>>
>> Just sayin'
>>
>> ~Malcolm
>>
>>
>> .-----Original Message-----
>> .From: bioconductor-bounces@r-**project.org<bioconductor- bounces@r-project.org="">
>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>> >
>> [mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>>
>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">>]
>> On Behalf Of Sean Davis
>> .Sent: Friday, January 04, 2013 3:37 PM
>> .To: Michael Lawrence
>> .Cc: Tim Triche, Jr.; Vedran Franke;
bioconductor@r-project.org
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">>
>>
>> .Subject: Re: [BioC] countMatches() (was: table for
GenomicRanges)
>> .
>> .On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
>> .<lawrence.michael@gene.com <mailto:lawrence.michael@gene.**com<lawrence.michael@gene.com="">>>
>> wrote:
>> .> The change to the behavior of %in% is a pretty big one.
Are you
>> thinking
>> .> that all set-based operations should behave this way? For
>> example, setdiff
>> .> and intersect? I really liked the syntax of "peaks %in%
genes".
>> In my
>> .> experience, it's way more common to ask questions about
overlap
>> than about
>> .> equality, so I'd rather optimize the API for that use
case. But
>> again,
>> .> that's just my personal bias.
>> .
>> .For what it is worth, I share Michael's personal bias here.
>> .
>> .Sean
>> .
>> .
>> .> Michael
>> .>
>> .>
>> .> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès
<hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>> wrote:
>> .>
>> .>> Hi,
>> .>>
>> .>> I added findMatches() and countMatches() to the latest
IRanges /
>> .>> GenomicRanges packages (in BioC devel only).
>> .>>
>> .>> findMatches(x, table): An enhanced version of match
that
>> .>> returns all the matches in a Hits object.
>> .>>
>> .>> countMatches(x, table): Returns an integer vector of
the
>> length
>> .>> of x, containing the number of matches in
table
>> for
>> .>> each element in x.
>> .>>
>>
>> .>> countMatches() is what you can use to
tally/count/tabulate
>> (choose your
>>
>> .>> preferred term) the unique elements in a GRanges object:
>> .>>
>> .>> library(GenomicRanges)
>> .>> set.seed(33)
>> .>> gr <- GRanges("chr1",
IRanges(sample(15,20,replace=****TRUE),
>> width=5))
>> .>>
>> .>> Then:
>> .>>
>> .>> > gr_levels <- sort(unique(gr))
>> .>> > countMatches(gr_levels, gr)
>> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
>> .>>
>> .>> Note that findMatches() and countMatches() also work on
>> IRanges and
>> .>> DNAStringSet objects, as well as on ordinary atomic
vectors:
>> .>>
>> .>> library(hgu95av2probe)
>> .>> library(Biostrings)
>> .>> probes <- DNAStringSet(hgu95av2probe)
>> .>> unique_probes <- unique(probes)
>> .>> count <- countMatches(unique_probes, probes)
>> .>> max(count) # 7
>> .>>
>> .>> I made other changes in IRanges/GenomicRanges so that the
notion
>> .>> of "match" between elements of a vector-like object now
>> consistently
>> .>> means "equality" instead of "overlap", even for range-
based
>> objects
>> .>> like IRanges or GRanges objects. This notion of
"equality" is
>> the
>> .>> same that is used by ==. The most visible consequence of
those
>> .>> changes is that using %in% between 2 IRanges or GRanges
objects
>> .>> 'query' and 'subject' in order to do overlaps was
replaced by
>> .>> overlapsAny(query, subject).
>> .>>
>> .>> overlapsAny(query, subject): Finds the ranges in
query that
>> .>> overlap any of the ranges in subject.
>> .>>
>>
>> .>> There are warnings and deprecation messages in place to
help
>> smooth
>>
>> .>> the transition.
>> .>>
>> .>> Cheers,
>> .>> H.
>> .>>
>> .>> --
>> .>> Hervé Pagès
>> .>>
>> .>> Program in Computational Biology
>> .>> Division of Public Health Sciences
>> .>> Fred Hutchinson Cancer Research Center
>> .>> 1100 Fairview Ave. N, M1-B514
>> .>> P.O. Box 19024
>> .>> Seattle, WA 98109-1024
>> .>>
>> .>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org>
>> .>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>> .>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>
>> .>>
>> .>
>> .> [[alternative HTML version deleted]]
>> .>
>> .>
>> .> ______________________________**_________________
>> .> Bioconductor mailing list
>> .> Bioconductor@r-project.org
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >
>>
>> .> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https :="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>> .> Search the archives:
>> http://news.gmane.org/gmane.**science.biology.informatics.**con
ductor<http: news.gmane.org="" gmane.science.biology.informatics.conduct="" or="">
>> .
>> ._____________________________**__________________
>> .Bioconductor mailing list
>> .Bioconductor@r-project.org
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >
>>
>> .https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>> .Search the archives:
>> http://news.gmane.org/gmane.**science.biology.informatics.**con
ductor<http: news.gmane.org="" gmane.science.biology.informatics.conduct="" or="">
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages@fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
[[alternative HTML version deleted]]
Hi Michael,
I don't think "match" (the word) always has to mean "equality" either.
However having match() (the function) do "whole exact matching" (aka
"equality") for any kind of vector-like object has the advantage of:
(a) making it consistent with base::match() (?base::match is pretty
explicit about what the contract of match() is)
(b) preserving its relationship with ==, duplicated(), unique(),
etc...
(c) not frustrating the user who needs something to do exact
matching on ranges (as I mentioned previously, if you take
match() away from him/her, s/he'll be left with nothing).
IMO those advantages counterbalance *by far* the very little
convenience you get from having 'match(query, subject)' do
'findOverlaps(query, subject, select="first")' on
IRanges/GRanges objects. If you need to do that, just use the
latter, or, if you think that's still too much typing, define
a wrapper e.g. 'ovmatch(query, subject)'.
There are plenty of specialized tools around for doing
inexact/fuzzy/partial/overlap matching for many particular types
of vector-like objects: grep() and family, pmatch(), charmatch(),
agrep(), grepRaw(), matchPattern() and family, findOverlaps() and
family, findIntervals(), etc... For the reasons I mentioned
above, none of them should hijack match() to make it do some
particular type of inexact matching on some particular type of
objects. Even if, for that particular type of objects, doing that
particular type of inexact matching is more common than doing
exact matching.
H.
On 01/06/2013 05:39 PM, Michael Lawrence wrote:
> I think having overlapsAny is a nice addition and helps make the API
> more complete and explicit. Are you sure we need to change the
behavior
> of the match method for this relatively uncommon use case?
Yes because otherwise users with a use case of doing match()
even if it's uncommon,
> I don't think
> "match" always has to mean "equality". It is a more general concept
in
> my mind. The most common use case for matching ranges is overlap.
Of course "match" doesn't always have to mean equality. But of base
>
> Michael
>
>
> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote:
>
> Yes 'peaks %in% genes' is cute and was probably doing the right
thing
> for most users (although not all). But 'exons %in% genes' is
cute too
> and was probably doing the wrong thing for all users. Advanced
users
> like you guys would have no problem switching to
>
> !is.na <http: is.na="">(findOverlaps(peaks, genes,
type="within",
> select="any"))
>
> or
>
> !is.na <http: is.na="">(findOverlaps(peaks, genes,
type="equal",
> select="any"))
>
> in case 'peaks %in% genes' was not doing exactly what you
wanted,
> but most users would not find this particularly friendly. Even
> worse, some users probably didn't realize that 'peaks %in%
genes'
> was not doing exactly what they thought it did because "peaks in
> genes" in English suggests that the peaks are within the genes,
> but it's not what 'peaks %in% genes' does.
>
> Having overlapsAny(), with exactly the same extra arguments as
> countOverlaps() and subsetByOverlaps() (i.e. 'maxgap',
'minoverlap',
> 'type', 'ignore.strand'), all of them documented (and with most
> users more or less familiar with them already) has the virtue to
> expose the user to all the options from the very start, and to
> help him/her make the right choice. Of course there will be
users
> that don't want or don't have the time to read/think about all
the
> options. Not a big deal: they'll just do 'overlapsAny(query,
subject)',
> which is not a lot more typing than 'query %in% subject',
especially
> if they use tab completion.
>
> It's true that it's more common to ask questions about overlap
than
> about equality but there are some use cases for the latter (as
the
> original thread shows). Until now, when you had such a use case,
you
> could not use match() or %in%, which would have been the natural
things
> to use, because they got hijacked to do something else, and you
were
> left with nothing. Not a satisfying situation. So at a minimum,
we
> needed to restore the true/real/original semantic of match() to
do
> "equality" instead of "overlap". But it's hard to do this for
match()
> and not do it for %in% too. For more than 99% of R users, %in%
is
> just a simple wrapper for 'match(x, table, nomatch = 0) > 0'
(this
> is how it has been documented and implemented in base R for many
> years). Not maintaining this relationship between %in% and
match()
> would only cause grief and frustration to newcomers to
Bioconductor.
>
> H.
>
>
>
> On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
>
> Hiya again,
>
> I am definitely a late comer to BioC, so I definitely easily
> defer to
> the tide of history.
>
> But I do think you miss my point Michael about the proposed
change
> making the relationship between %in% and match for
{G,I}Ranges{List}
> mimic that between other vectors, and I do think that
changing
> the API
> would make other late-comers take to BioC easier/faster.
>
> That said, I NEVER use %in% so I really have no stake in the
> matter, and
> I DEFINITELY appreciate the argument to not changing the API
> just for
> sematic sweetness.
>
> That that said, Herve is _/so good/_ about deprecations and
warnings
>
> that make such changes fairly easily digestible.
>
> That that that.... enough.... I bow out of this one....!!!!
>
> Always learning and Happy New Year to all lurkers,
>
> ~Malcolm
>
> *From:*Michael Lawrence [mailto:lawrence.michael at
gene.__com
> <mailto:lawrence.michael at="" gene.com="">]
> *Sent:* Friday, January 04, 2013 5:11 PM
> *To:* Cook, Malcolm
> *Cc:* Sean Davis; Michael Lawrence; Hervé Pagès
> (hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">); Tim
>
> Triche, Jr.; Vedran Franke; bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> *Subject:* Re: [BioC] countMatches() (was: table for
GenomicRanges)
>
>
> On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm <mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>
wrote:
>
> Hiya,
>
> For what it is worth...
>
> I think the change to %in% is warranted.
>
> If I understand correctly, this change restores the
relationship
> between
> the semantics of `%in` and the semantics of `match`.
>
> From the docs:
>
> '"%in%" <- function(x, table) match(x, table, nomatch =
0) > 0'
>
> Herve's change restores this relationship.
>
>
> match and %in% were initially consistent (both considering
any
> overlap);
> Herve has changed both of them together. The whole idea
behind
> IRanges
> is that ranges are special data types with special
semantics. We
> have
> reimplemented much of the existing R vector API using those
> semantics;
> this extends beyond match/%in%. I am hesitant about making
such
> sweeping
> changes to the API so late in the life-cycle of the package.
> There was a
> feature request for a way to count identical ranges in a set
of
> ranges.
> Let's please not get carried away and start redesigning the
API
> for this
> one, albeit useful, request. There are all sorts of
> inconsistencies in
> the API, and many of them were conscious decisions that
considered
> practical use cases.
>
> Michael
>
>
> Herve, I suspect you were you as a result able to
> completely drop
> all the `%in%,BiocClass1,BiocClass2` definitions and
depend
> upon
> base::%in%
>
> Am I right?
>
> If so, may I suggest that Herve stay the course, with
the
> addition of
> '"%ol%" <- function(a, b) findOverlaps(a, b,
maxgap=0L,
> minoverlap=1L, type='any', select='all') > 0'
>
> This would provide a perspicacious idiom, thereby
> optimizing the API
> for Michaels observed common use case.
>
> Just sayin'
>
> ~Malcolm
>
>
> .-----Original Message-----
> .From: bioconductor-bounces at r-__project.org
> <mailto:bioconductor-bounces at="" r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>
> [mailto:bioconductor-bounces at __r-project.org
> <mailto:bioconductor-bounces at="" r-project.org="">
>
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>] On Behalf
Of Sean
> Davis
> .Sent: Friday, January 04, 2013 3:37 PM
> .To: Michael Lawrence
> .Cc: Tim Triche, Jr.; Vedran Franke;
> bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
>
> .Subject: Re: [BioC] countMatches() (was: table for
> GenomicRanges)
> .
> .On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
> .<lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>> wrote:
> .> The change to the behavior of %in% is a pretty big
> one. Are you
> thinking
> .> that all set-based operations should behave this
way? For
> example, setdiff
> .> and intersect? I really liked the syntax of "peaks
> %in% genes".
> In my
> .> experience, it's way more common to ask questions
> about overlap
> than about
> .> equality, so I'd rather optimize the API for that
use
> case. But
> again,
> .> that's just my personal bias.
> .
> .For what it is worth, I share Michael's personal
bias here.
> .
> .Sean
> .
> .
> .> Michael
> .>
> .>
> .> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès
> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>> wrote:
> .>
> .>> Hi,
> .>>
> .>> I added findMatches() and countMatches() to the
> latest IRanges /
> .>> GenomicRanges packages (in BioC devel only).
> .>>
> .>> findMatches(x, table): An enhanced version of
> ?match? that
> .>> returns all the matches in a Hits
object.
> .>>
> .>> countMatches(x, table): Returns an integer
vector
> of the length
> .>> of ?x?, containing the number of
matches in
> ?table? for
> .>> each element in ?x?.
> .>>
>
> .>> countMatches() is what you can use to
> tally/count/tabulate
> (choose your
>
> .>> preferred term) the unique elements in a GRanges
object:
> .>>
> .>> library(GenomicRanges)
> .>> set.seed(33)
> .>> gr <- GRanges("chr1",
> IRanges(sample(15,20,replace=*__*TRUE),
> width=5))
> .>>
> .>> Then:
> .>>
> .>> > gr_levels <- sort(unique(gr))
> .>> > countMatches(gr_levels, gr)
> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
> .>>
> .>> Note that findMatches() and countMatches() also
work on
> IRanges and
> .>> DNAStringSet objects, as well as on ordinary
atomic
> vectors:
> .>>
> .>> library(hgu95av2probe)
> .>> library(Biostrings)
> .>> probes <- DNAStringSet(hgu95av2probe)
> .>> unique_probes <- unique(probes)
> .>> count <- countMatches(unique_probes, probes)
> .>> max(count) # 7
> .>>
> .>> I made other changes in IRanges/GenomicRanges so
that
> the notion
> .>> of "match" between elements of a vector-like
object now
> consistently
> .>> means "equality" instead of "overlap", even for
> range-based
> objects
> .>> like IRanges or GRanges objects. This notion of
> "equality" is the
> .>> same that is used by ==. The most visible
consequence
> of those
> .>> changes is that using %in% between 2 IRanges or
> GRanges objects
> .>> 'query' and 'subject' in order to do overlaps was
> replaced by
> .>> overlapsAny(query, subject).
> .>>
> .>> overlapsAny(query, subject): Finds the ranges
in
> ?query? that
> .>> overlap any of the ranges in ?subject?.
> .>>
>
> .>> There are warnings and deprecation messages in
place
> to help
> smooth
>
> .>> the transition.
> .>>
> .>> Cheers,
> .>> H.
> .>>
> .>> --
> .>> Hervé Pagès
> .>>
> .>> Program in Computational Biology
> .>> Division of Public Health Sciences
> .>> Fred Hutchinson Cancer Research Center
> .>> 1100 Fairview Ave. N, M1-B514
> .>> P.O. Box 19024
> .>> Seattle, WA 98109-1024
> .>>
> .>> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> .>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> .>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
> .>>
> .>
> .> [[alternative HTML version deleted]]
> .>
> .>
> .> _________________________________________________
> .> Bioconductor mailing list
> .> Bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
>
> .>
https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
> .> Search the archives:
>
http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">
> .
> ._________________________________________________
> .Bioconductor mailing list
> .Bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
>
> .https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
> .Search the archives:
>
http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès <hpages@fhcrc.org> wrote:
> Hi Michael,
>
> I don't think "match" (the word) always has to mean "equality"
either.
> However having match() (the function) do "whole exact matching" (aka
> "equality") for any kind of vector-like object has the advantage of:
>
> (a) making it consistent with base::match() (?base::match is
pretty
> explicit about what the contract of match() is)
>
>
(a) alone is obviously not enough. We have many methods, like the set
operations, that treat ranges specially. Are we going to start moving
everything toward the base behavior? And have rangeIntersect,
rangeSetdiff,
etc?
(b) preserving its relationship with ==, duplicated(), unique(),
> etc...
>
>
So it becomes consistent with duplicated/unique, but we lose
consistency
with the set operations.
> (c) not frustrating the user who needs something to do exact
> matching on ranges (as I mentioned previously, if you take
> match() away from him/her, s/he'll be left with nothing).
>
>
No one has ever asked for match() to behave this way. There was a
request
for a way to tabulate identical ranges. It was a nice idea to extract
the
general "outer equal" findMatches function. But the changes seem to be
snow-balling. These types of changes mean a lot of maintenance work
for
the users. A deprecation cycle does not circumvent that.
IMO those advantages counterbalance *by far* the very little
> convenience you get from having 'match(query, subject)' do
> 'findOverlaps(query, subject, select="first")' on
> IRanges/GRanges objects. If you need to do that, just use the
> latter, or, if you think that's still too much typing, define
> a wrapper e.g. 'ovmatch(query, subject)'.
>
> There are plenty of specialized tools around for doing
> inexact/fuzzy/partial/overlap matching for many particular types
> of vector-like objects: grep() and family, pmatch(), charmatch(),
> agrep(), grepRaw(), matchPattern() and family, findOverlaps() and
> family, findIntervals(), etc... For the reasons I mentioned
> above, none of them should hijack match() to make it do some
> particular type of inexact matching on some particular type of
> objects. Even if, for that particular type of objects, doing that
> particular type of inexact matching is more common than doing
> exact matching.
>
> H.
>
>
>
> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>
>> I think having overlapsAny is a nice addition and helps make the
API
>> more complete and explicit. Are you sure we need to change the
behavior
>> of the match method for this relatively uncommon use case?
>>
>
> Yes because otherwise users with a use case of doing match()
>
> even if it's uncommon,
>
>
> I don't think
>> "match" always has to mean "equality". It is a more general concept
in
>> my mind. The most common use case for matching ranges is overlap.
>>
>
> Of course "match" doesn't always have to mean equality. But of base
>
>
>> Michael
>>
>>
>> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès <hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>> wrote:
>>
>> Yes 'peaks %in% genes' is cute and was probably doing the right
thing
>> for most users (although not all). But 'exons %in% genes' is
cute too
>> and was probably doing the wrong thing for all users. Advanced
users
>> like you guys would have no problem switching to
>>
>> !is.na <http: is.na="">(findOverlaps(**peaks, genes,
type="within",
>> select="any"))
>>
>> or
>>
>> !is.na <http: is.na="">(findOverlaps(**peaks, genes,
type="equal",
>>
>> select="any"))
>>
>> in case 'peaks %in% genes' was not doing exactly what you
wanted,
>> but most users would not find this particularly friendly. Even
>> worse, some users probably didn't realize that 'peaks %in%
genes'
>> was not doing exactly what they thought it did because "peaks
in
>> genes" in English suggests that the peaks are within the genes,
>> but it's not what 'peaks %in% genes' does.
>>
>> Having overlapsAny(), with exactly the same extra arguments as
>> countOverlaps() and subsetByOverlaps() (i.e. 'maxgap',
'minoverlap',
>> 'type', 'ignore.strand'), all of them documented (and with most
>> users more or less familiar with them already) has the virtue
to
>> expose the user to all the options from the very start, and to
>> help him/her make the right choice. Of course there will be
users
>> that don't want or don't have the time to read/think about all
the
>> options. Not a big deal: they'll just do 'overlapsAny(query,
>> subject)',
>> which is not a lot more typing than 'query %in% subject',
especially
>> if they use tab completion.
>>
>> It's true that it's more common to ask questions about overlap
than
>> about equality but there are some use cases for the latter (as
the
>> original thread shows). Until now, when you had such a use
case, you
>> could not use match() or %in%, which would have been the
natural
>> things
>> to use, because they got hijacked to do something else, and you
were
>> left with nothing. Not a satisfying situation. So at a minimum,
we
>> needed to restore the true/real/original semantic of match() to
do
>> "equality" instead of "overlap". But it's hard to do this for
match()
>> and not do it for %in% too. For more than 99% of R users, %in%
is
>> just a simple wrapper for 'match(x, table, nomatch = 0) > 0'
(this
>> is how it has been documented and implemented in base R for
many
>> years). Not maintaining this relationship between %in% and
match()
>> would only cause grief and frustration to newcomers to
Bioconductor.
>>
>> H.
>>
>>
>>
>> On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
>>
>> Hiya again,
>>
>> I am definitely a late comer to BioC, so I definitely
easily
>> defer to
>> the tide of history.
>>
>> But I do think you miss my point Michael about the proposed
change
>> making the relationship between %in% and match for
>> {G,I}Ranges{List}
>> mimic that between other vectors, and I do think that
changing
>> the API
>> would make other late-comers take to BioC easier/faster.
>>
>> That said, I NEVER use %in% so I really have no stake in
the
>> matter, and
>> I DEFINITELY appreciate the argument to not changing the
API
>> just for
>> sematic sweetness.
>>
>> That that said, Herve is _/so good/_ about deprecations and
>> warnings
>>
>> that make such changes fairly easily digestible.
>>
>> That that that.... enough.... I bow out of this one....!!!!
>>
>> Always learning and Happy New Year to all lurkers,
>>
>> ~Malcolm
>>
>> *From:*Michael Lawrence
[mailto:lawrence.michael@gene.**__com
>>
>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>]
>> *Sent:* Friday, January 04, 2013 5:11 PM
>> *To:* Cook, Malcolm
>> *Cc:* Sean Davis; Michael Lawrence; Hervé Pagès
>> (hpages@fhcrc.org <mailto:hpages@fhcrc.org>); Tim
>>
>>
>> Triche, Jr.; Vedran Franke; bioconductor@r-project.org
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >
>> *Subject:* Re: [BioC] countMatches() (was: table for
>> GenomicRanges)
>>
>>
>> On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm
<mec@stowers.org>> <mailto:mec@stowers.org>
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>> wrote:
>>
>> Hiya,
>>
>> For what it is worth...
>>
>> I think the change to %in% is warranted.
>>
>> If I understand correctly, this change restores the
relationship
>> between
>> the semantics of `%in` and the semantics of `match`.
>>
>> From the docs:
>>
>> '"%in%" <- function(x, table) match(x, table, nomatch =
0) >
>> 0'
>>
>> Herve's change restores this relationship.
>>
>>
>> match and %in% were initially consistent (both considering
any
>> overlap);
>> Herve has changed both of them together. The whole idea
behind
>> IRanges
>> is that ranges are special data types with special
semantics. We
>> have
>> reimplemented much of the existing R vector API using those
>> semantics;
>> this extends beyond match/%in%. I am hesitant about making
such
>> sweeping
>> changes to the API so late in the life-cycle of the
package.
>> There was a
>> feature request for a way to count identical ranges in a
set of
>> ranges.
>> Let's please not get carried away and start redesigning the
API
>> for this
>> one, albeit useful, request. There are all sorts of
>> inconsistencies in
>> the API, and many of them were conscious decisions that
considered
>> practical use cases.
>>
>> Michael
>>
>>
>> Herve, I suspect you were you as a result able to
>> completely drop
>> all the `%in%,BiocClass1,BiocClass2` definitions and
depend
>> upon
>> base::%in%
>>
>> Am I right?
>>
>> If so, may I suggest that Herve stay the course, with
the
>> addition of
>> '"%ol%" <- function(a, b) findOverlaps(a, b,
maxgap=0L,
>> minoverlap=1L, type='any', select='all') > 0'
>>
>> This would provide a perspicacious idiom, thereby
>> optimizing the API
>> for Michaels observed common use case.
>>
>> Just sayin'
>>
>> ~Malcolm
>>
>>
>> .-----Original Message-----
>> .From: bioconductor-bounces@r-__**project.org
<bioconductor-bounces@r-__project.org>
>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>> >
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>> >>
>> [mailto:bioconductor-bounces@_**_r-project.org
>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>> >
>>
>> <mailto:bioconductor-bounces@_**_r-project.org>>
>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">>>]
>> On Behalf Of Sean
>> Davis
>> .Sent: Friday, January 04, 2013 3:37 PM
>> .To: Michael Lawrence
>> .Cc: Tim Triche, Jr.; Vedran Franke;
>> bioconductor@r-project.org
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>
>>
>> .Subject: Re: [BioC] countMatches() (was: table for
>> GenomicRanges)
>> .
>> .On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
>> .<lawrence.michael@gene.com>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>
>> <mailto:lawrence.michael@gene.**__com>>
>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>>>
>> wrote:
>> .> The change to the behavior of %in% is a pretty
big
>> one. Are you
>> thinking
>> .> that all set-based operations should behave this
way?
>> For
>> example, setdiff
>> .> and intersect? I really liked the syntax of
"peaks
>> %in% genes".
>> In my
>> .> experience, it's way more common to ask questions
>> about overlap
>> than about
>> .> equality, so I'd rather optimize the API for that
use
>> case. But
>> again,
>> .> that's just my personal bias.
>> .
>> .For what it is worth, I share Michael's personal
bias
>> here.
>> .
>> .Sean
>> .
>> .
>> .> Michael
>> .>
>> .>
>> .> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès
>> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
wrote:
>> .>
>> .>> Hi,
>> .>>
>> .>> I added findMatches() and countMatches() to the
>> latest IRanges /
>> .>> GenomicRanges packages (in BioC devel only).
>> .>>
>> .>> findMatches(x, table): An enhanced version of
>> match that
>> .>> returns all the matches in a Hits
object.
>> .>>
>> .>> countMatches(x, table): Returns an integer
vector
>> of the length
>> .>> of x, containing the number of
matches in
>> table for
>> .>> each element in x.
>> .>>
>>
>> .>> countMatches() is what you can use to
>> tally/count/tabulate
>> (choose your
>>
>> .>> preferred term) the unique elements in a GRanges
>> object:
>> .>>
>> .>> library(GenomicRanges)
>> .>> set.seed(33)
>> .>> gr <- GRanges("chr1",
>> IRanges(sample(15,20,replace=***__*TRUE),
>>
>> width=5))
>> .>>
>> .>> Then:
>> .>>
>> .>> > gr_levels <- sort(unique(gr))
>> .>> > countMatches(gr_levels, gr)
>> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
>> .>>
>> .>> Note that findMatches() and countMatches() also
work on
>> IRanges and
>> .>> DNAStringSet objects, as well as on ordinary
atomic
>> vectors:
>> .>>
>> .>> library(hgu95av2probe)
>> .>> library(Biostrings)
>> .>> probes <- DNAStringSet(hgu95av2probe)
>> .>> unique_probes <- unique(probes)
>> .>> count <- countMatches(unique_probes, probes)
>> .>> max(count) # 7
>> .>>
>> .>> I made other changes in IRanges/GenomicRanges so
that
>> the notion
>> .>> of "match" between elements of a vector-like
object now
>> consistently
>> .>> means "equality" instead of "overlap", even for
>> range-based
>> objects
>> .>> like IRanges or GRanges objects. This notion of
>> "equality" is the
>> .>> same that is used by ==. The most visible
consequence
>> of those
>> .>> changes is that using %in% between 2 IRanges or
>> GRanges objects
>> .>> 'query' and 'subject' in order to do overlaps
was
>> replaced by
>> .>> overlapsAny(query, subject).
>> .>>
>> .>> overlapsAny(query, subject): Finds the ranges
in
>> query that
>> .>> overlap any of the ranges in subject.
>> .>>
>>
>> .>> There are warnings and deprecation messages in
place
>> to help
>> smooth
>>
>> .>> the transition.
>> .>>
>> .>> Cheers,
>> .>> H.
>> .>>
>> .>> --
>> .>> Hervé Pagès
>> .>>
>> .>> Program in Computational Biology
>> .>> Division of Public Health Sciences
>> .>> Fred Hutchinson Cancer Research Center
>> .>> 1100 Fairview Ave. N, M1-B514
>> .>> P.O. Box 19024
>> .>> Seattle, WA 98109-1024
>> .>>
>> .>> E-mail: hpages@fhcrc.org
<mailto:hpages@fhcrc.org>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>>
>> .>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>> <tel:%28206%29%20667-5791>
>> .>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>> <tel:%28206%29%20667-1319>
>>
>> .>>
>> .>
>> .> [[alternative HTML version deleted]]
>> .>
>> .>
>> .>
______________________________**___________________
>>
>> .> Bioconductor mailing list
>> .> Bioconductor@r-project.org
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>
>>
>> .> https://stat.ethz.ch/mailman/_**_listinfo/biocond
uctor<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>
>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https="" :="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>> >
>> .> Search the archives:
>>
http://news.gmane.org/gmane.__**science.biology.informatics.__**
>> conductor<http: news.gmane.org="" gmane.__science.biology.informatics="" .__conductor="">
>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor="">
>> >
>> .
>> ._____________________________**____________________
>>
>> .Bioconductor mailing list
>> .Bioconductor@r-project.org
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>
>>
>> .https://stat.ethz.ch/mailman/**__listinfo/bioconduc
tor<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>
>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https="" :="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>> >
>> .Search the archives:
>>
http://news.gmane.org/gmane.__**science.biology.informatics.__**
>> conductor<http: news.gmane.org="" gmane.__science.biology.informatics="" .__conductor="">
>>
>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor="">
>> >
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org>
>>
>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages@fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
[[alternative HTML version deleted]]
So why not leave %in% as it was and transition everything forward to
explicitly using { `%within%`, `%overlaps%`|`%overlapping%`,
`%equals%` }
such that
identical( x %within% table, countOverlaps(x, table, type='within')
> 0 )
== TRUE
identical( x %overlaps% table, countOverlaps(x, table, type='any') >
0 )
== TRUE
identical( x %equals% table, countOverlaps(x, table, type='equal') >
0 )
== TRUE
and for the time being,
identical( x %overlaps% table, countOverlaps(x, table, type='any') >
0 )
== TRUE ## but with a noisy nastygram that will halt if
options("warn"=2)
No breakage for %in% methods until such time as a full deprecation
cycle
has passed, and if the maintainers can't be arsed to do anything at
all
about the warnings by the second full release, then perhaps they don't
really care that much after all. Just a thought?
>From someone (me) who has their own issues with keeping everything up
to
date and should know better. If you want to use %in% for
peaks %in% genes (why on earth would you do this rather than peaks
%in%
promoters(genes), anyways?)
then a nastygram could be emitted "WARNING: YOUR SHORTHAND NOTATION IS
DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and everyone is (more
or
less) happy.
On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
<lawrence.michael@gene.com> wrote:
>
>
>
> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès <hpages@fhcrc.org>
wrote:
>
>> Hi Michael,
>>
>> I don't think "match" (the word) always has to mean "equality"
either.
>> However having match() (the function) do "whole exact matching"
(aka
>> "equality") for any kind of vector-like object has the advantage
of:
>>
>> (a) making it consistent with base::match() (?base::match is
pretty
>> explicit about what the contract of match() is)
>>
>>
> (a) alone is obviously not enough. We have many methods, like the
set
> operations, that treat ranges specially. Are we going to start
moving
> everything toward the base behavior? And have rangeIntersect,
rangeSetdiff,
> etc?
>
> (b) preserving its relationship with ==, duplicated(), unique(),
>> etc...
>>
>>
> So it becomes consistent with duplicated/unique, but we lose
consistency
> with the set operations.
>
>
>> (c) not frustrating the user who needs something to do exact
>> matching on ranges (as I mentioned previously, if you take
>> match() away from him/her, s/he'll be left with nothing).
>>
>>
> No one has ever asked for match() to behave this way. There was a
request
> for a way to tabulate identical ranges. It was a nice idea to
extract the
> general "outer equal" findMatches function. But the changes seem to
be
> snow-balling. These types of changes mean a lot of maintenance work
for
> the users. A deprecation cycle does not circumvent that.
>
>
> IMO those advantages counterbalance *by far* the very little
>> convenience you get from having 'match(query, subject)' do
>> 'findOverlaps(query, subject, select="first")' on
>> IRanges/GRanges objects. If you need to do that, just use the
>> latter, or, if you think that's still too much typing, define
>> a wrapper e.g. 'ovmatch(query, subject)'.
>>
>> There are plenty of specialized tools around for doing
>> inexact/fuzzy/partial/overlap matching for many particular types
>> of vector-like objects: grep() and family, pmatch(), charmatch(),
>> agrep(), grepRaw(), matchPattern() and family, findOverlaps() and
>> family, findIntervals(), etc... For the reasons I mentioned
>> above, none of them should hijack match() to make it do some
>> particular type of inexact matching on some particular type of
>> objects. Even if, for that particular type of objects, doing that
>> particular type of inexact matching is more common than doing
>> exact matching.
>>
>> H.
>>
>>
>>
>> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>>
>>> I think having overlapsAny is a nice addition and helps make the
API
>>> more complete and explicit. Are you sure we need to change the
behavior
>>> of the match method for this relatively uncommon use case?
>>>
>>
>> Yes because otherwise users with a use case of doing match()
>>
>> even if it's uncommon,
>>
>>
>> I don't think
>>> "match" always has to mean "equality". It is a more general
concept in
>>> my mind. The most common use case for matching ranges is overlap.
>>>
>>
>> Of course "match" doesn't always have to mean equality. But of base
>>
>>
>>> Michael
>>>
>>>
>>> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès <hpages@fhcrc.org>>> <mailto:hpages@fhcrc.org>> wrote:
>>>
>>> Yes 'peaks %in% genes' is cute and was probably doing the
right thing
>>> for most users (although not all). But 'exons %in% genes' is
cute too
>>> and was probably doing the wrong thing for all users.
Advanced users
>>> like you guys would have no problem switching to
>>>
>>> !is.na <http: is.na="">(findOverlaps(**peaks, genes,
type="within",
>>> select="any"))
>>>
>>> or
>>>
>>> !is.na <http: is.na="">(findOverlaps(**peaks, genes,
type="equal",
>>>
>>> select="any"))
>>>
>>> in case 'peaks %in% genes' was not doing exactly what you
wanted,
>>> but most users would not find this particularly friendly. Even
>>> worse, some users probably didn't realize that 'peaks %in%
genes'
>>> was not doing exactly what they thought it did because "peaks
in
>>> genes" in English suggests that the peaks are within the
genes,
>>> but it's not what 'peaks %in% genes' does.
>>>
>>> Having overlapsAny(), with exactly the same extra arguments as
>>> countOverlaps() and subsetByOverlaps() (i.e. 'maxgap',
'minoverlap',
>>> 'type', 'ignore.strand'), all of them documented (and with
most
>>> users more or less familiar with them already) has the virtue
to
>>> expose the user to all the options from the very start, and to
>>> help him/her make the right choice. Of course there will be
users
>>> that don't want or don't have the time to read/think about all
the
>>> options. Not a big deal: they'll just do 'overlapsAny(query,
>>> subject)',
>>> which is not a lot more typing than 'query %in% subject',
especially
>>> if they use tab completion.
>>>
>>> It's true that it's more common to ask questions about overlap
than
>>> about equality but there are some use cases for the latter (as
the
>>> original thread shows). Until now, when you had such a use
case, you
>>> could not use match() or %in%, which would have been the
natural
>>> things
>>> to use, because they got hijacked to do something else, and
you were
>>> left with nothing. Not a satisfying situation. So at a
minimum, we
>>> needed to restore the true/real/original semantic of match()
to do
>>> "equality" instead of "overlap". But it's hard to do this for
match()
>>> and not do it for %in% too. For more than 99% of R users, %in%
is
>>> just a simple wrapper for 'match(x, table, nomatch = 0) > 0'
(this
>>> is how it has been documented and implemented in base R for
many
>>> years). Not maintaining this relationship between %in% and
match()
>>> would only cause grief and frustration to newcomers to
Bioconductor.
>>>
>>> H.
>>>
>>>
>>>
>>> On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
>>>
>>> Hiya again,
>>>
>>> I am definitely a late comer to BioC, so I definitely
easily
>>> defer to
>>> the tide of history.
>>>
>>> But I do think you miss my point Michael about the
proposed
>>> change
>>> making the relationship between %in% and match for
>>> {G,I}Ranges{List}
>>> mimic that between other vectors, and I do think that
changing
>>> the API
>>> would make other late-comers take to BioC easier/faster.
>>>
>>> That said, I NEVER use %in% so I really have no stake in
the
>>> matter, and
>>> I DEFINITELY appreciate the argument to not changing the
API
>>> just for
>>> sematic sweetness.
>>>
>>> That that said, Herve is _/so good/_ about deprecations
and
>>> warnings
>>>
>>> that make such changes fairly easily digestible.
>>>
>>> That that that.... enough.... I bow out of this
one....!!!!
>>>
>>> Always learning and Happy New Year to all lurkers,
>>>
>>> ~Malcolm
>>>
>>> *From:*Michael Lawrence
[mailto:lawrence.michael@gene.**__com
>>>
>>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">
>>> >]
>>> *Sent:* Friday, January 04, 2013 5:11 PM
>>> *To:* Cook, Malcolm
>>> *Cc:* Sean Davis; Michael Lawrence; Hervé Pagès
>>> (hpages@fhcrc.org <mailto:hpages@fhcrc.org>); Tim
>>>
>>>
>>> Triche, Jr.; Vedran Franke; bioconductor@r-project.org
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >
>>> *Subject:* Re: [BioC] countMatches() (was: table for
>>> GenomicRanges)
>>>
>>>
>>> On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm
<mec@stowers.org>>> <mailto:mec@stowers.org>
>>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>> wrote:
>>>
>>> Hiya,
>>>
>>> For what it is worth...
>>>
>>> I think the change to %in% is warranted.
>>>
>>> If I understand correctly, this change restores the
relationship
>>> between
>>> the semantics of `%in` and the semantics of `match`.
>>>
>>> From the docs:
>>>
>>> '"%in%" <- function(x, table) match(x, table, nomatch
= 0) >
>>> 0'
>>>
>>> Herve's change restores this relationship.
>>>
>>>
>>> match and %in% were initially consistent (both considering
any
>>> overlap);
>>> Herve has changed both of them together. The whole idea
behind
>>> IRanges
>>> is that ranges are special data types with special
semantics. We
>>> have
>>> reimplemented much of the existing R vector API using
those
>>> semantics;
>>> this extends beyond match/%in%. I am hesitant about making
such
>>> sweeping
>>> changes to the API so late in the life-cycle of the
package.
>>> There was a
>>> feature request for a way to count identical ranges in a
set of
>>> ranges.
>>> Let's please not get carried away and start redesigning
the API
>>> for this
>>> one, albeit useful, request. There are all sorts of
>>> inconsistencies in
>>> the API, and many of them were conscious decisions that
>>> considered
>>> practical use cases.
>>>
>>> Michael
>>>
>>>
>>> Herve, I suspect you were you as a result able to
>>> completely drop
>>> all the `%in%,BiocClass1,BiocClass2` definitions and
depend
>>> upon
>>> base::%in%
>>>
>>> Am I right?
>>>
>>> If so, may I suggest that Herve stay the course, with
the
>>> addition of
>>> '"%ol%" <- function(a, b) findOverlaps(a, b,
maxgap=0L,
>>> minoverlap=1L, type='any', select='all') > 0'
>>>
>>> This would provide a perspicacious idiom, thereby
>>> optimizing the API
>>> for Michaels observed common use case.
>>>
>>> Just sayin'
>>>
>>> ~Malcolm
>>>
>>>
>>> .-----Original Message-----
>>> .From: bioconductor-bounces@r-__**project.org
<bioconductor-bounces@r-__project.org>
>>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>>> >
>>> <mailto:bioconductor-bounces@_**_r-project.org>>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>>> >>
>>> [mailto:bioconductor-bounces@_**_r-project.org
>>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>>> >
>>>
>>> <mailto:bioconductor-bounces@_**_r-project.org>>>
>>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">>>]
>>> On Behalf Of Sean
>>> Davis
>>> .Sent: Friday, January 04, 2013 3:37 PM
>>> .To: Michael Lawrence
>>> .Cc: Tim Triche, Jr.; Vedran Franke;
>>> bioconductor@r-project.org
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>
>>>
>>> .Subject: Re: [BioC] countMatches() (was: table for
>>> GenomicRanges)
>>> .
>>> .On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
>>> .<lawrence.michael@gene.com>>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>
>>> <mailto:lawrence.michael@gene.**__com>>>
>>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>>>
>>> wrote:
>>> .> The change to the behavior of %in% is a pretty
big
>>> one. Are you
>>> thinking
>>> .> that all set-based operations should behave this
way?
>>> For
>>> example, setdiff
>>> .> and intersect? I really liked the syntax of
"peaks
>>> %in% genes".
>>> In my
>>> .> experience, it's way more common to ask
questions
>>> about overlap
>>> than about
>>> .> equality, so I'd rather optimize the API for
that use
>>> case. But
>>> again,
>>> .> that's just my personal bias.
>>> .
>>> .For what it is worth, I share Michael's personal
bias
>>> here.
>>> .
>>> .Sean
>>> .
>>> .
>>> .> Michael
>>> .>
>>> .>
>>> .> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès
>>> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
wrote:
>>> .>
>>> .>> Hi,
>>> .>>
>>> .>> I added findMatches() and countMatches() to the
>>> latest IRanges /
>>> .>> GenomicRanges packages (in BioC devel only).
>>> .>>
>>> .>> findMatches(x, table): An enhanced version of
>>> match that
>>> .>> returns all the matches in a Hits
object.
>>> .>>
>>> .>> countMatches(x, table): Returns an integer
vector
>>> of the length
>>> .>> of x, containing the number of
matches in
>>> table for
>>> .>> each element in x.
>>> .>>
>>>
>>> .>> countMatches() is what you can use to
>>> tally/count/tabulate
>>> (choose your
>>>
>>> .>> preferred term) the unique elements in a
GRanges
>>> object:
>>> .>>
>>> .>> library(GenomicRanges)
>>> .>> set.seed(33)
>>> .>> gr <- GRanges("chr1",
>>> IRanges(sample(15,20,replace=***__*TRUE),
>>>
>>> width=5))
>>> .>>
>>> .>> Then:
>>> .>>
>>> .>> > gr_levels <- sort(unique(gr))
>>> .>> > countMatches(gr_levels, gr)
>>> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
>>> .>>
>>> .>> Note that findMatches() and countMatches() also
work
>>> on
>>> IRanges and
>>> .>> DNAStringSet objects, as well as on ordinary
atomic
>>> vectors:
>>> .>>
>>> .>> library(hgu95av2probe)
>>> .>> library(Biostrings)
>>> .>> probes <- DNAStringSet(hgu95av2probe)
>>> .>> unique_probes <- unique(probes)
>>> .>> count <- countMatches(unique_probes, probes)
>>> .>> max(count) # 7
>>> .>>
>>> .>> I made other changes in IRanges/GenomicRanges
so that
>>> the notion
>>> .>> of "match" between elements of a vector-like
object
>>> now
>>> consistently
>>> .>> means "equality" instead of "overlap", even for
>>> range-based
>>> objects
>>> .>> like IRanges or GRanges objects. This notion of
>>> "equality" is the
>>> .>> same that is used by ==. The most visible
consequence
>>> of those
>>> .>> changes is that using %in% between 2 IRanges or
>>> GRanges objects
>>> .>> 'query' and 'subject' in order to do overlaps
was
>>> replaced by
>>> .>> overlapsAny(query, subject).
>>> .>>
>>> .>> overlapsAny(query, subject): Finds the ranges
in
>>> query that
>>> .>> overlap any of the ranges in subject.
>>> .>>
>>>
>>> .>> There are warnings and deprecation messages in
place
>>> to help
>>> smooth
>>>
>>> .>> the transition.
>>> .>>
>>> .>> Cheers,
>>> .>> H.
>>> .>>
>>> .>> --
>>> .>> Hervé Pagès
>>> .>>
>>> .>> Program in Computational Biology
>>> .>> Division of Public Health Sciences
>>> .>> Fred Hutchinson Cancer Research Center
>>> .>> 1100 Fairview Ave. N, M1-B514
>>> .>> P.O. Box 19024
>>> .>> Seattle, WA 98109-1024
>>> .>>
>>> .>> E-mail: hpages@fhcrc.org
<mailto:hpages@fhcrc.org>
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>>>
>>> .>> Phone: (206) 667-5791
<tel:%28206%29%20667-5791>
>>> <tel:%28206%29%20667-5791>
>>> .>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>> <tel:%28206%29%20667-1319>
>>>
>>> .>>
>>> .>
>>> .> [[alternative HTML version deleted]]
>>> .>
>>> .>
>>> .>
______________________________**___________________
>>>
>>> .> Bioconductor mailing list
>>> .> Bioconductor@r-project.org
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>
>>>
>>> .> https://stat.ethz.ch/mailman/_**_listinfo/biocon
ductor<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>>
>>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<http="" s:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>>> >
>>> .> Search the archives:
>>>
http://news.gmane.org/gmane.__**science.biology.informatics.__**
>>> conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor="">
>>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">>> conductor<http: news.gmane.org="" gmane.science.biology.informatics.="" conductor="">
>>> >
>>> .
>>>
._____________________________**____________________
>>>
>>> .Bioconductor mailing list
>>> .Bioconductor@r-project.org
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>
>>>
>>> .https://stat.ethz.ch/mailman/**__listinfo/biocondu
ctor<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>>
>>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<http="" s:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>>> >
>>> .Search the archives:
>>>
http://news.gmane.org/gmane.__**science.biology.informatics.__**
>>> conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor="">
>>>
>>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">>> conductor<http: news.gmane.org="" gmane.science.biology.informatics.="" conductor="">
>>> >
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org>
>>>
>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>>
>>>
>>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages@fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
*expletives*!
I meant
identical( x %overlaps% table, x %in% table ) == TRUE ## but with a
noisy
nastygram that will halt if options("warn"=2)
rather than
identical( x %overlaps% table, countOverlaps(x, table, type='any') >
0 )
== TRUE ## which should not have a nastygram at all!
Many eyes something something.
On Mon, Jan 7, 2013 at 11:45 AM, Tim Triche, Jr.
<tim.triche@gmail.com>wrote:
> So why not leave %in% as it was and transition everything forward to
> explicitly using { `%within%`, `%overlaps%`|`%overlapping%`,
`%equals%` }
> such that
>
> identical( x %within% table, countOverlaps(x, table,
type='within') > 0
> ) == TRUE
> identical( x %overlaps% table, countOverlaps(x, table, type='any')
> 0 )
> == TRUE
> identical( x %equals% table, countOverlaps(x, table, type='equal')
> 0 )
> == TRUE
>
> and for the time being,
>
> identical( x %overlaps% table, countOverlaps(x, table, type='any')
> 0 )
> == TRUE ## but with a noisy nastygram that will halt if
options("warn"=2)
>
> No breakage for %in% methods until such time as a full deprecation
cycle
> has passed, and if the maintainers can't be arsed to do anything at
all
> about the warnings by the second full release, then perhaps they
don't
> really care that much after all. Just a thought?
>
> From someone (me) who has their own issues with keeping everything
up to
> date and should know better. If you want to use %in% for
>
> peaks %in% genes (why on earth would you do this rather than peaks
%in%
> promoters(genes), anyways?)
>
> then a nastygram could be emitted "WARNING: YOUR SHORTHAND NOTATION
IS
> DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and everyone is
(more or
> less) happy.
>
>
>
> On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence <
> lawrence.michael@gene.com> wrote:
>
>>
>>
>>
>> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès <hpages@fhcrc.org>
wrote:
>>
>>> Hi Michael,
>>>
>>> I don't think "match" (the word) always has to mean "equality"
either.
>>> However having match() (the function) do "whole exact matching"
(aka
>>> "equality") for any kind of vector-like object has the advantage
of:
>>>
>>> (a) making it consistent with base::match() (?base::match is
pretty
>>> explicit about what the contract of match() is)
>>>
>>>
>> (a) alone is obviously not enough. We have many methods, like the
set
>> operations, that treat ranges specially. Are we going to start
moving
>> everything toward the base behavior? And have rangeIntersect,
rangeSetdiff,
>> etc?
>>
>> (b) preserving its relationship with ==, duplicated(), unique(),
>>> etc...
>>>
>>>
>> So it becomes consistent with duplicated/unique, but we lose
consistency
>> with the set operations.
>>
>>
>>> (c) not frustrating the user who needs something to do exact
>>> matching on ranges (as I mentioned previously, if you take
>>> match() away from him/her, s/he'll be left with nothing).
>>>
>>>
>> No one has ever asked for match() to behave this way. There was a
request
>> for a way to tabulate identical ranges. It was a nice idea to
extract the
>> general "outer equal" findMatches function. But the changes seem to
be
>> snow-balling. These types of changes mean a lot of maintenance
work for
>> the users. A deprecation cycle does not circumvent that.
>>
>>
>> IMO those advantages counterbalance *by far* the very little
>>> convenience you get from having 'match(query, subject)' do
>>> 'findOverlaps(query, subject, select="first")' on
>>> IRanges/GRanges objects. If you need to do that, just use the
>>> latter, or, if you think that's still too much typing, define
>>> a wrapper e.g. 'ovmatch(query, subject)'.
>>>
>>> There are plenty of specialized tools around for doing
>>> inexact/fuzzy/partial/overlap matching for many particular types
>>> of vector-like objects: grep() and family, pmatch(), charmatch(),
>>> agrep(), grepRaw(), matchPattern() and family, findOverlaps() and
>>> family, findIntervals(), etc... For the reasons I mentioned
>>> above, none of them should hijack match() to make it do some
>>> particular type of inexact matching on some particular type of
>>> objects. Even if, for that particular type of objects, doing that
>>> particular type of inexact matching is more common than doing
>>> exact matching.
>>>
>>> H.
>>>
>>>
>>>
>>> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>>>
>>>> I think having overlapsAny is a nice addition and helps make the
API
>>>> more complete and explicit. Are you sure we need to change the
behavior
>>>> of the match method for this relatively uncommon use case?
>>>>
>>>
>>> Yes because otherwise users with a use case of doing match()
>>>
>>> even if it's uncommon,
>>>
>>>
>>> I don't think
>>>> "match" always has to mean "equality". It is a more general
concept in
>>>> my mind. The most common use case for matching ranges is overlap.
>>>>
>>>
>>> Of course "match" doesn't always have to mean equality. But of
base
>>>
>>>
>>>> Michael
>>>>
>>>>
>>>> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès <hpages@fhcrc.org>>>> <mailto:hpages@fhcrc.org>> wrote:
>>>>
>>>> Yes 'peaks %in% genes' is cute and was probably doing the
right
>>>> thing
>>>> for most users (although not all). But 'exons %in% genes' is
cute
>>>> too
>>>> and was probably doing the wrong thing for all users.
Advanced
>>>> users
>>>> like you guys would have no problem switching to
>>>>
>>>> !is.na <http: is.na="">(findOverlaps(**peaks, genes,
>>>> type="within",
>>>> select="any"))
>>>>
>>>> or
>>>>
>>>> !is.na <http: is.na="">(findOverlaps(**peaks, genes,
type="equal",
>>>>
>>>> select="any"))
>>>>
>>>> in case 'peaks %in% genes' was not doing exactly what you
wanted,
>>>> but most users would not find this particularly friendly.
Even
>>>> worse, some users probably didn't realize that 'peaks %in%
genes'
>>>> was not doing exactly what they thought it did because "peaks
in
>>>> genes" in English suggests that the peaks are within the
genes,
>>>> but it's not what 'peaks %in% genes' does.
>>>>
>>>> Having overlapsAny(), with exactly the same extra arguments
as
>>>> countOverlaps() and subsetByOverlaps() (i.e. 'maxgap',
'minoverlap',
>>>> 'type', 'ignore.strand'), all of them documented (and with
most
>>>> users more or less familiar with them already) has the virtue
to
>>>> expose the user to all the options from the very start, and
to
>>>> help him/her make the right choice. Of course there will be
users
>>>> that don't want or don't have the time to read/think about
all the
>>>> options. Not a big deal: they'll just do 'overlapsAny(query,
>>>> subject)',
>>>> which is not a lot more typing than 'query %in% subject',
especially
>>>> if they use tab completion.
>>>>
>>>> It's true that it's more common to ask questions about
overlap than
>>>> about equality but there are some use cases for the latter
(as the
>>>> original thread shows). Until now, when you had such a use
case, you
>>>> could not use match() or %in%, which would have been the
natural
>>>> things
>>>> to use, because they got hijacked to do something else, and
you were
>>>> left with nothing. Not a satisfying situation. So at a
minimum, we
>>>> needed to restore the true/real/original semantic of match()
to do
>>>> "equality" instead of "overlap". But it's hard to do this for
>>>> match()
>>>> and not do it for %in% too. For more than 99% of R users,
%in% is
>>>> just a simple wrapper for 'match(x, table, nomatch = 0) > 0'
(this
>>>> is how it has been documented and implemented in base R for
many
>>>> years). Not maintaining this relationship between %in% and
match()
>>>> would only cause grief and frustration to newcomers to
Bioconductor.
>>>>
>>>> H.
>>>>
>>>>
>>>>
>>>> On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
>>>>
>>>> Hiya again,
>>>>
>>>> I am definitely a late comer to BioC, so I definitely
easily
>>>> defer to
>>>> the tide of history.
>>>>
>>>> But I do think you miss my point Michael about the
proposed
>>>> change
>>>> making the relationship between %in% and match for
>>>> {G,I}Ranges{List}
>>>> mimic that between other vectors, and I do think that
changing
>>>> the API
>>>> would make other late-comers take to BioC easier/faster.
>>>>
>>>> That said, I NEVER use %in% so I really have no stake in
the
>>>> matter, and
>>>> I DEFINITELY appreciate the argument to not changing the
API
>>>> just for
>>>> sematic sweetness.
>>>>
>>>> That that said, Herve is _/so good/_ about deprecations
and
>>>> warnings
>>>>
>>>> that make such changes fairly easily digestible.
>>>>
>>>> That that that.... enough.... I bow out of this
one....!!!!
>>>>
>>>> Always learning and Happy New Year to all lurkers,
>>>>
>>>> ~Malcolm
>>>>
>>>> *From:*Michael Lawrence
[mailto:lawrence.michael@gene.**__com
>>>>
>>>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">
>>>> >]
>>>> *Sent:* Friday, January 04, 2013 5:11 PM
>>>> *To:* Cook, Malcolm
>>>> *Cc:* Sean Davis; Michael Lawrence; Hervé Pagès
>>>> (hpages@fhcrc.org <mailto:hpages@fhcrc.org>); Tim
>>>>
>>>>
>>>> Triche, Jr.; Vedran Franke; bioconductor@r-project.org
>>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>>> >
>>>> *Subject:* Re: [BioC] countMatches() (was: table for
>>>> GenomicRanges)
>>>>
>>>>
>>>> On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm
<mec@stowers.org>>>> <mailto:mec@stowers.org>
>>>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>> wrote:
>>>>
>>>> Hiya,
>>>>
>>>> For what it is worth...
>>>>
>>>> I think the change to %in% is warranted.
>>>>
>>>> If I understand correctly, this change restores the
relationship
>>>> between
>>>> the semantics of `%in` and the semantics of `match`.
>>>>
>>>> From the docs:
>>>>
>>>> '"%in%" <- function(x, table) match(x, table, nomatch
= 0)
>>>> > 0'
>>>>
>>>> Herve's change restores this relationship.
>>>>
>>>>
>>>> match and %in% were initially consistent (both
considering any
>>>> overlap);
>>>> Herve has changed both of them together. The whole idea
behind
>>>> IRanges
>>>> is that ranges are special data types with special
semantics. We
>>>> have
>>>> reimplemented much of the existing R vector API using
those
>>>> semantics;
>>>> this extends beyond match/%in%. I am hesitant about
making such
>>>> sweeping
>>>> changes to the API so late in the life-cycle of the
package.
>>>> There was a
>>>> feature request for a way to count identical ranges in a
set of
>>>> ranges.
>>>> Let's please not get carried away and start redesigning
the API
>>>> for this
>>>> one, albeit useful, request. There are all sorts of
>>>> inconsistencies in
>>>> the API, and many of them were conscious decisions that
>>>> considered
>>>> practical use cases.
>>>>
>>>> Michael
>>>>
>>>>
>>>> Herve, I suspect you were you as a result able to
>>>> completely drop
>>>> all the `%in%,BiocClass1,BiocClass2` definitions and
depend
>>>> upon
>>>> base::%in%
>>>>
>>>> Am I right?
>>>>
>>>> If so, may I suggest that Herve stay the course,
with the
>>>> addition of
>>>> '"%ol%" <- function(a, b) findOverlaps(a, b,
maxgap=0L,
>>>> minoverlap=1L, type='any', select='all') > 0'
>>>>
>>>> This would provide a perspicacious idiom, thereby
>>>> optimizing the API
>>>> for Michaels observed common use case.
>>>>
>>>> Just sayin'
>>>>
>>>> ~Malcolm
>>>>
>>>>
>>>> .-----Original Message-----
>>>> .From: bioconductor-bounces@r-__**project.org
<bioconductor-bounces@r-__project.org>
>>>> <mailto:bioconductor-bounces@**r-project.org <bioconductor-bounces@r-project.org="">
>>>> >
>>>> <mailto:bioconductor-bounces@_**_r-project.org>>>> <mailto:bioconductor-bounces@**r-project.org <bioconductor-bounces@r-project.org="">
>>>> >>
>>>> [mailto:bioconductor-bounces@_**_r-project.org
>>>> <mailto:bioconductor-bounces@**r-project.org <bioconductor-bounces@r-project.org="">
>>>> >
>>>>
>>>> <mailto:bioconductor-bounces@_**_r-project.org>>>>
>>>> <mailto:bioconductor-bounces@**r-project.org <bioconductor-bounces@r-project.org="">>>]
>>>> On Behalf Of Sean
>>>> Davis
>>>> .Sent: Friday, January 04, 2013 3:37 PM
>>>> .To: Michael Lawrence
>>>> .Cc: Tim Triche, Jr.; Vedran Franke;
>>>> bioconductor@r-project.org
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>>> >
>>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>>
>>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>>> >>
>>>>
>>>> .Subject: Re: [BioC] countMatches() (was: table
for
>>>> GenomicRanges)
>>>> .
>>>> .On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
>>>> .<lawrence.michael@gene.com>>>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">
>>>> >
>>>> <mailto:lawrence.michael@gene.**__com>>>>
>>>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>>>
>>>> wrote:
>>>> .> The change to the behavior of %in% is a pretty
big
>>>> one. Are you
>>>> thinking
>>>> .> that all set-based operations should behave
this way?
>>>> For
>>>> example, setdiff
>>>> .> and intersect? I really liked the syntax of
"peaks
>>>> %in% genes".
>>>> In my
>>>> .> experience, it's way more common to ask
questions
>>>> about overlap
>>>> than about
>>>> .> equality, so I'd rather optimize the API for
that use
>>>> case. But
>>>> again,
>>>> .> that's just my personal bias.
>>>> .
>>>> .For what it is worth, I share Michael's personal
bias
>>>> here.
>>>> .
>>>> .Sean
>>>> .
>>>> .
>>>> .> Michael
>>>> .>
>>>> .>
>>>> .> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès
>>>> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
>>>> wrote:
>>>> .>
>>>> .>> Hi,
>>>> .>>
>>>> .>> I added findMatches() and countMatches() to
the
>>>> latest IRanges /
>>>> .>> GenomicRanges packages (in BioC devel only).
>>>> .>>
>>>> .>> findMatches(x, table): An enhanced version
of
>>>> match that
>>>> .>> returns all the matches in a Hits
object.
>>>> .>>
>>>> .>> countMatches(x, table): Returns an integer
vector
>>>> of the length
>>>> .>> of x, containing the number of
matches in
>>>> table for
>>>> .>> each element in x.
>>>> .>>
>>>>
>>>> .>> countMatches() is what you can use to
>>>> tally/count/tabulate
>>>> (choose your
>>>>
>>>> .>> preferred term) the unique elements in a
GRanges
>>>> object:
>>>> .>>
>>>> .>> library(GenomicRanges)
>>>> .>> set.seed(33)
>>>> .>> gr <- GRanges("chr1",
>>>> IRanges(sample(15,20,replace=***__*TRUE),
>>>>
>>>> width=5))
>>>> .>>
>>>> .>> Then:
>>>> .>>
>>>> .>> > gr_levels <- sort(unique(gr))
>>>> .>> > countMatches(gr_levels, gr)
>>>> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
>>>> .>>
>>>> .>> Note that findMatches() and countMatches()
also work
>>>> on
>>>> IRanges and
>>>> .>> DNAStringSet objects, as well as on ordinary
atomic
>>>> vectors:
>>>> .>>
>>>> .>> library(hgu95av2probe)
>>>> .>> library(Biostrings)
>>>> .>> probes <- DNAStringSet(hgu95av2probe)
>>>> .>> unique_probes <- unique(probes)
>>>> .>> count <- countMatches(unique_probes, probes)
>>>> .>> max(count) # 7
>>>> .>>
>>>> .>> I made other changes in IRanges/GenomicRanges
so that
>>>> the notion
>>>> .>> of "match" between elements of a vector-like
object
>>>> now
>>>> consistently
>>>> .>> means "equality" instead of "overlap", even
for
>>>> range-based
>>>> objects
>>>> .>> like IRanges or GRanges objects. This notion
of
>>>> "equality" is the
>>>> .>> same that is used by ==. The most visible
consequence
>>>> of those
>>>> .>> changes is that using %in% between 2 IRanges
or
>>>> GRanges objects
>>>> .>> 'query' and 'subject' in order to do overlaps
was
>>>> replaced by
>>>> .>> overlapsAny(query, subject).
>>>> .>>
>>>> .>> overlapsAny(query, subject): Finds the
ranges in
>>>> query that
>>>> .>> overlap any of the ranges in subject.
>>>> .>>
>>>>
>>>> .>> There are warnings and deprecation messages in
place
>>>> to help
>>>> smooth
>>>>
>>>> .>> the transition.
>>>> .>>
>>>> .>> Cheers,
>>>> .>> H.
>>>> .>>
>>>> .>> --
>>>> .>> Hervé Pagès
>>>> .>>
>>>> .>> Program in Computational Biology
>>>> .>> Division of Public Health Sciences
>>>> .>> Fred Hutchinson Cancer Research Center
>>>> .>> 1100 Fairview Ave. N, M1-B514
>>>> .>> P.O. Box 19024
>>>> .>> Seattle, WA 98109-1024
>>>> .>>
>>>> .>> E-mail: hpages@fhcrc.org
<mailto:hpages@fhcrc.org>
>>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>>>>
>>>> .>> Phone: (206) 667-5791
<tel:%28206%29%20667-5791>
>>>> <tel:%28206%29%20667-5791>
>>>> .>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>>> <tel:%28206%29%20667-1319>
>>>>
>>>> .>>
>>>> .>
>>>> .> [[alternative HTML version deleted]]
>>>> .>
>>>> .>
>>>> .>
______________________________**___________________
>>>>
>>>> .> Bioconductor mailing list
>>>> .> Bioconductor@r-project.org
>>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>>> >
>>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>>> >>
>>>>
>>>> .> https://stat.ethz.ch/mailman/_**
>>>> _listinfo/bioconductor<https: stat.ethz.ch="" mailman="" __listinfo="" bi="" oconductor="">
>>>>
>>>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<htt="" ps:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>>>> >
>>>> .> Search the archives:
>>>>
http://news.gmane.org/gmane.__**science.biology.informatics.__*
>>>> *conductor<http: news.gmane.org="" gmane.__science.biology.informat="" ics.__conductor="">
>>>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">>>> conductor<http: news.gmane.org="" gmane.science.biology.informatics="" .conductor="">
>>>> >
>>>> .
>>>>
._____________________________**____________________
>>>>
>>>> .Bioconductor mailing list
>>>> .Bioconductor@r-project.org
>>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>>> >
>>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>>> >>
>>>>
>>>> .https://stat.ethz.ch/mailman/**__listinfo/biocond
uctor<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>>>
>>>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<htt="" ps:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>>>> >
>>>> .Search the archives:
>>>>
http://news.gmane.org/gmane.__**science.biology.informatics.__*
>>>> *conductor<http: news.gmane.org="" gmane.__science.biology.informat="" ics.__conductor="">
>>>>
>>>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">>>> conductor<http: news.gmane.org="" gmane.science.biology.informatics="" .conductor="">
>>>> >
>>>>
>>>>
>>>> --
>>>> Hervé Pagès
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org>
>>>>
>>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>>>
>>>>
>>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages@fhcrc.org
>>> Phone: (206) 667-5791
>>> Fax: (206) 667-1319
>>>
>>
>>
>
>
> --
> *A model is a lie that helps you see the truth.*
> *
> *
> Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
Hi Tim,
I could add the %ov% operator as a replacement for the old %in%. So
you
would write 'peaks %ov% genes' instead of 'peaks %in% genes'. Would
just
be a convenience wrapper for 'overlapsAny(peaks, genes)'.
Cheers,
H.
On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
> So why not leave %in% as it was and transition everything forward to
> explicitly using { `%within%`, `%overlaps%`|`%overlapping%`,
`%equals%`
> } such that
>
> identical( x %within% table, countOverlaps(x, table,
type='within') >
> 0 ) == TRUE
> identical( x %overlaps% table, countOverlaps(x, table,
type='any') >
> 0 ) == TRUE
> identical( x %equals% table, countOverlaps(x, table,
type='equal') >
> 0 ) == TRUE
>
> and for the time being,
>
> identical( x %overlaps% table, countOverlaps(x, table,
type='any') >
> 0 ) == TRUE ## but with a noisy nastygram that will halt if
> options("warn"=2)
> No breakage for %in% methods until such time as a full deprecation
cycle
> has passed, and if the maintainers can't be arsed to do anything at
all
> about the warnings by the second full release, then perhaps they
don't
> really care that much after all. Just a thought?
>
> From someone (me) who has their own issues with keeping everything
up
> to date and should know better. If you want to use %in% for
>
> peaks %in% genes (why on earth would you do this rather than
peaks
> %in% promoters(genes), anyways?)
>
> then a nastygram could be emitted "WARNING: YOUR SHORTHAND NOTATION
IS
> DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and everyone is
(more
> or less) happy.
>
>
>
> On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
> <lawrence.michael at="" gene.com="" <mailto:lawrence.michael="" at="" gene.com="">>
wrote:
>
>
>
>
> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote:
>
> Hi Michael,
>
> I don't think "match" (the word) always has to mean
"equality"
> either.
> However having match() (the function) do "whole exact
matching" (aka
> "equality") for any kind of vector-like object has the
advantage of:
>
> (a) making it consistent with base::match() (?base::match
is
> pretty
> explicit about what the contract of match() is)
>
>
> (a) alone is obviously not enough. We have many methods, like
the
> set operations, that treat ranges specially. Are we going to
start
> moving everything toward the base behavior? And have
rangeIntersect,
> rangeSetdiff, etc?
>
> (b) preserving its relationship with ==, duplicated(),
unique(),
> etc...
>
>
> So it becomes consistent with duplicated/unique, but we lose
> consistency with the set operations.
>
> (c) not frustrating the user who needs something to do
exact
> matching on ranges (as I mentioned previously, if you
take
> match() away from him/her, s/he'll be left with
nothing).
>
>
> No one has ever asked for match() to behave this way. There was
a
> request for a way to tabulate identical ranges. It was a nice
idea
> to extract the general "outer equal" findMatches function. But
the
> changes seem to be snow-balling. These types of changes mean a
lot
> of maintenance work for the users. A deprecation cycle does not
> circumvent that.
>
>
> IMO those advantages counterbalance *by far* the very little
> convenience you get from having 'match(query, subject)' do
> 'findOverlaps(query, subject, select="first")' on
> IRanges/GRanges objects. If you need to do that, just use
the
> latter, or, if you think that's still too much typing,
define
> a wrapper e.g. 'ovmatch(query, subject)'.
>
> There are plenty of specialized tools around for doing
> inexact/fuzzy/partial/overlap matching for many particular
types
> of vector-like objects: grep() and family, pmatch(),
charmatch(),
> agrep(), grepRaw(), matchPattern() and family,
findOverlaps() and
> family, findIntervals(), etc... For the reasons I mentioned
> above, none of them should hijack match() to make it do some
> particular type of inexact matching on some particular type
of
> objects. Even if, for that particular type of objects, doing
that
> particular type of inexact matching is more common than
doing
> exact matching.
>
> H.
>
>
>
> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>
> I think having overlapsAny is a nice addition and helps
make
> the API
> more complete and explicit. Are you sure we need to
change
> the behavior
> of the match method for this relatively uncommon use
case?
>
>
> Yes because otherwise users with a use case of doing match()
>
> even if it's uncommon,
>
>
> I don't think
> "match" always has to mean "equality". It is a more
general
> concept in
> my mind. The most common use case for matching ranges is
> overlap.
>
>
> Of course "match" doesn't always have to mean equality. But
of base
>
>
> Michael
>
>
> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>> wrote:
>
> Yes 'peaks %in% genes' is cute and was probably
doing
> the right thing
> for most users (although not all). But 'exons %in%
> genes' is cute too
> and was probably doing the wrong thing for all
users.
> Advanced users
> like you guys would have no problem switching to
>
> !is.na <http: is.na="">
> <http: is.na="">(findOverlaps(__peaks, genes,
type="within",
> select="any"))
>
> or
>
> !is.na <http: is.na="">
> <http: is.na="">(findOverlaps(__peaks, genes,
type="equal",
>
> select="any"))
>
> in case 'peaks %in% genes' was not doing exactly
what
> you wanted,
> but most users would not find this particularly
> friendly. Even
> worse, some users probably didn't realize that
'peaks
> %in% genes'
> was not doing exactly what they thought it did
because
> "peaks in
> genes" in English suggests that the peaks are
within
> the genes,
> but it's not what 'peaks %in% genes' does.
>
> Having overlapsAny(), with exactly the same extra
> arguments as
> countOverlaps() and subsetByOverlaps() (i.e.
'maxgap',
> 'minoverlap',
> 'type', 'ignore.strand'), all of them documented
(and
> with most
> users more or less familiar with them already) has
the
> virtue to
> expose the user to all the options from the very
start,
> and to
> help him/her make the right choice. Of course there
> will be users
> that don't want or don't have the time to
read/think
> about all the
> options. Not a big deal: they'll just do
> 'overlapsAny(query, subject)',
> which is not a lot more typing than 'query %in%
> subject', especially
> if they use tab completion.
>
> It's true that it's more common to ask questions
about
> overlap than
> about equality but there are some use cases for the
> latter (as the
> original thread shows). Until now, when you had
such a
> use case, you
> could not use match() or %in%, which would have
been
> the natural things
> to use, because they got hijacked to do something
else,
> and you were
> left with nothing. Not a satisfying situation. So
at a
> minimum, we
> needed to restore the true/real/original semantic
of
> match() to do
> "equality" instead of "overlap". But it's hard to
do
> this for match()
> and not do it for %in% too. For more than 99% of R
> users, %in% is
> just a simple wrapper for 'match(x, table, nomatch
= 0)
> > 0' (this
> is how it has been documented and implemented in
base R
> for many
> years). Not maintaining this relationship between
%in%
> and match()
> would only cause grief and frustration to newcomers
to
> Bioconductor.
>
> H.
>
>
>
> On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
>
> Hiya again,
>
> I am definitely a late comer to BioC, so I
> definitely easily
> defer to
> the tide of history.
>
> But I do think you miss my point Michael about
the
> proposed change
> making the relationship between %in% and match
for
> {G,I}Ranges{List}
> mimic that between other vectors, and I do
think
> that changing
> the API
> would make other late-comers take to BioC
> easier/faster.
>
> That said, I NEVER use %in% so I really have no
> stake in the
> matter, and
> I DEFINITELY appreciate the argument to not
> changing the API
> just for
> sematic sweetness.
>
> That that said, Herve is _/so good/_ about
> deprecations and warnings
>
> that make such changes fairly easily
digestible.
>
> That that that.... enough.... I bow out of this
> one....!!!!
>
> Always learning and Happy New Year to all
lurkers,
>
> ~Malcolm
>
> *From:*Michael Lawrence
> [mailto:lawrence.michael at gene.
> <mailto:lawrence.michael at="" gene.="">____com
>
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>]
> *Sent:* Friday, January 04, 2013 5:11 PM
> *To:* Cook, Malcolm
> *Cc:* Sean Davis; Michael Lawrence; Hervé Pagès
> (hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>); Tim
>
>
> Triche, Jr.; Vedran Franke;
> bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> *Subject:* Re: [BioC] countMatches() (was:
table
> for GenomicRanges)
>
>
> On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm
> <mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>> wrote:
>
> Hiya,
>
> For what it is worth...
>
> I think the change to %in% is warranted.
>
> If I understand correctly, this change restores
the
> relationship
> between
> the semantics of `%in` and the semantics of
`match`.
>
> From the docs:
>
> '"%in%" <- function(x, table) match(x,
table,
> nomatch = 0) > 0'
>
> Herve's change restores this relationship.
>
>
> match and %in% were initially consistent (both
> considering any
> overlap);
> Herve has changed both of them together. The
whole
> idea behind
> IRanges
> is that ranges are special data types with
special
> semantics. We
> have
> reimplemented much of the existing R vector API
> using those
> semantics;
> this extends beyond match/%in%. I am hesitant
about
> making such
> sweeping
> changes to the API so late in the life-cycle of
the
> package.
> There was a
> feature request for a way to count identical
ranges
> in a set of
> ranges.
> Let's please not get carried away and start
> redesigning the API
> for this
> one, albeit useful, request. There are all
sorts of
> inconsistencies in
> the API, and many of them were conscious
decisions
> that considered
> practical use cases.
>
> Michael
>
>
> Herve, I suspect you were you as a result
able to
> completely drop
> all the `%in%,BiocClass1,BiocClass2`
> definitions and depend
> upon
> base::%in%
>
> Am I right?
>
> If so, may I suggest that Herve stay the
> course, with the
> addition of
> '"%ol%" <- function(a, b)
findOverlaps(a,
> b, maxgap=0L,
> minoverlap=1L, type='any', select='all') >
0'
>
> This would provide a perspicacious idiom,
thereby
> optimizing the API
> for Michaels observed common use case.
>
> Just sayin'
>
> ~Malcolm
>
>
> .-----Original Message-----
> .From:
> bioconductor-bounces at r-____project.org
> <mailto:bioconductor-bounces at="" r-__project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>
> [mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
>
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>] On
Behalf Of Sean
> Davis
> .Sent: Friday, January 04, 2013 3:37 PM
> .To: Michael Lawrence
> .Cc: Tim Triche, Jr.; Vedran Franke;
> bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
>
> .Subject: Re: [BioC] countMatches()
(was:
> table for
> GenomicRanges)
> .
> .On Fri, Jan 4, 2013 at 4:32 PM, Michael
> Lawrence
> .<lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">____com
>
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>>> wrote:
> .> The change to the behavior of %in% is
a
> pretty big
> one. Are you
> thinking
> .> that all set-based operations should
> behave this way? For
> example, setdiff
> .> and intersect? I really liked the
syntax
> of "peaks
> %in% genes".
> In my
> .> experience, it's way more common to
ask
> questions
> about overlap
> than about
> .> equality, so I'd rather optimize the
API
> for that use
> case. But
> again,
> .> that's just my personal bias.
> .
> .For what it is worth, I share Michael's
> personal bias here.
> .
> .Sean
> .
> .
> .> Michael
> .>
> .>
> .> On Fri, Jan 4, 2013 at 1:11 PM, Hervé
Pagès
> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>>> wrote:
> .>
> .>> Hi,
> .>>
> .>> I added findMatches() and
countMatches()
> to the
> latest IRanges /
> .>> GenomicRanges packages (in BioC
devel only).
> .>>
> .>> findMatches(x, table): An enhanced
> version of
> ?match? that
> .>> returns all the matches in
a
> Hits object.
> .>>
> .>> countMatches(x, table): Returns an
> integer vector
> of the length
> .>> of ?x?, containing the
number
> of matches in
> ?table? for
> .>> each element in ?x?.
> .>>
>
> .>> countMatches() is what you can use
to
> tally/count/tabulate
> (choose your
>
> .>> preferred term) the unique elements
in a
> GRanges object:
> .>>
> .>> library(GenomicRanges)
> .>> set.seed(33)
> .>> gr <- GRanges("chr1",
> IRanges(sample(15,20,replace=*____*TRUE),
>
> width=5))
> .>>
> .>> Then:
> .>>
> .>> > gr_levels <- sort(unique(gr))
> .>> > countMatches(gr_levels, gr)
> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
> .>>
> .>> Note that findMatches() and
> countMatches() also work on
> IRanges and
> .>> DNAStringSet objects, as well as on
> ordinary atomic
> vectors:
> .>>
> .>> library(hgu95av2probe)
> .>> library(Biostrings)
> .>> probes <-
DNAStringSet(hgu95av2probe)
> .>> unique_probes <- unique(probes)
> .>> count <-
countMatches(unique_probes,
> probes)
> .>> max(count) # 7
> .>>
> .>> I made other changes in
> IRanges/GenomicRanges so that
> the notion
> .>> of "match" between elements of a
> vector-like object now
> consistently
> .>> means "equality" instead of
"overlap",
> even for
> range-based
> objects
> .>> like IRanges or GRanges objects.
This
> notion of
> "equality" is the
> .>> same that is used by ==. The most
> visible consequence
> of those
> .>> changes is that using %in% between 2
> IRanges or
> GRanges objects
> .>> 'query' and 'subject' in order to do
> overlaps was
> replaced by
> .>> overlapsAny(query, subject).
> .>>
> .>> overlapsAny(query, subject): Finds
the
> ranges in
> ?query? that
> .>> overlap any of the ranges in
?subject?.
> .>>
>
> .>> There are warnings and deprecation
> messages in place
> to help
> smooth
>
> .>> the transition.
> .>>
> .>> Cheers,
> .>> H.
> .>>
> .>> --
> .>> Hervé Pagès
> .>>
> .>> Program in Computational Biology
> .>> Division of Public Health Sciences
> .>> Fred Hutchinson Cancer Research
Center
> .>> 1100 Fairview Ave. N, M1-B514
> .>> P.O. Box 19024
> .>> Seattle, WA 98109-1024
> .>>
> .>> E-mail: hpages at fhcrc.org
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
>
> .>> Phone: (206) 667-5791
> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> .>> Fax: (206) 667-1319
> <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
> .>>
> .>
> .> [[alternative HTML version
deleted]]
> .>
> .>
> .>
> ___________________________________________________
>
> .> Bioconductor mailing list
> .> Bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
>
> .>
> https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
>
> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>
> .> Search the archives:
> http://news.gmane.org/gmane.____science.biology.informat
ics.____conductor
> <http: news.gmane.org="" gmane.__science.biology.informati="" cs.__conductor="">
>
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>
> .
>
> .___________________________________________________
>
> .Bioconductor mailing list
> .Bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
>
>
>
.https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
>
> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>
> .Search the archives:
> http://news.gmane.org/gmane.____science.biology.informat
ics.____conductor
> <http: news.gmane.org="" gmane.__science.biology.informati="" cs.__conductor="">
>
>
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
>
>
>
> --
> /A model is a lie that helps you see the truth./
> /
> /
> Howard Skipper
> <http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
hell, I'll add the operators if there's support for them. obviously
they're not a big deal and a patch would take 5 minutes flat.
my hope was to be very explicit about what each type of operation
meant, so
that when a newcomer to the Ranges API sees
peaks %overlapping% promoters(someGroupOfGenesWeCareAbout)
it cannot be confused with
peaks %within% rangesThatCorrespondToSomeChromatinState
or
peaks %equal% aBunchOfDNAseFootprints
or
DMRs %in% genes ## what the hell does this really mean, anyways?
it's
so bad on so many levels
because whenever someone says "what is the advantage of Ranges-based
analyses?", these are the archetypal sorts of queries that come to
mind.
Except that usually in my examples they are based on posterior
probabilities, but perhaps that could stand to change.
Anyways, that's just my bias, and you're doing the heavy lifting. But
if
people agree with the motivations I will write the patch today.
Cheers,
--t
On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès <hpages@fhcrc.org> wrote:
> Hi Tim,
>
> I could add the %ov% operator as a replacement for the old %in%. So
you
> would write 'peaks %ov% genes' instead of 'peaks %in% genes'. Would
just
> be a convenience wrapper for 'overlapsAny(peaks, genes)'.
>
> Cheers,
> H.
>
>
> On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
>
>> So why not leave %in% as it was and transition everything forward
to
>> explicitly using { `%within%`, `%overlaps%`|`%overlapping%`,
`%equals%`
>> } such that
>>
>> identical( x %within% table, countOverlaps(x, table,
type='within') >
>> 0 ) == TRUE
>> identical( x %overlaps% table, countOverlaps(x, table,
type='any') >
>> 0 ) == TRUE
>> identical( x %equals% table, countOverlaps(x, table,
type='equal') >
>> 0 ) == TRUE
>>
>> and for the time being,
>>
>> identical( x %overlaps% table, countOverlaps(x, table,
type='any') >
>> 0 ) == TRUE ## but with a noisy nastygram that will halt if
>> options("warn"=2)
>> No breakage for %in% methods until such time as a full deprecation
cycle
>> has passed, and if the maintainers can't be arsed to do anything at
all
>> about the warnings by the second full release, then perhaps they
don't
>> really care that much after all. Just a thought?
>>
>> From someone (me) who has their own issues with keeping everything
up
>> to date and should know better. If you want to use %in% for
>>
>> peaks %in% genes (why on earth would you do this rather than
peaks
>> %in% promoters(genes), anyways?)
>>
>> then a nastygram could be emitted "WARNING: YOUR SHORTHAND NOTATION
IS
>> DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and everyone is
(more
>> or less) happy.
>>
>>
>>
>> On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
>> <lawrence.michael@gene.com <mailto:lawrence.michael@gene.**com<lawrence.michael@gene.com="">>>
>> wrote:
>>
>>
>>
>>
>> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès <hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>> wrote:
>>
>> Hi Michael,
>>
>> I don't think "match" (the word) always has to mean
"equality"
>> either.
>> However having match() (the function) do "whole exact
matching"
>> (aka
>> "equality") for any kind of vector-like object has the
advantage
>> of:
>>
>> (a) making it consistent with base::match()
(?base::match is
>> pretty
>> explicit about what the contract of match() is)
>>
>>
>> (a) alone is obviously not enough. We have many methods, like
the
>> set operations, that treat ranges specially. Are we going to
start
>> moving everything toward the base behavior? And have
rangeIntersect,
>> rangeSetdiff, etc?
>>
>> (b) preserving its relationship with ==, duplicated(),
>> unique(),
>> etc...
>>
>>
>> So it becomes consistent with duplicated/unique, but we lose
>> consistency with the set operations.
>>
>> (c) not frustrating the user who needs something to do
exact
>> matching on ranges (as I mentioned previously, if
you take
>> match() away from him/her, s/he'll be left with
nothing).
>>
>>
>> No one has ever asked for match() to behave this way. There was
a
>> request for a way to tabulate identical ranges. It was a nice
idea
>> to extract the general "outer equal" findMatches function. But
the
>> changes seem to be snow-balling. These types of changes mean a
lot
>> of maintenance work for the users. A deprecation cycle does not
>> circumvent that.
>>
>>
>> IMO those advantages counterbalance *by far* the very
little
>> convenience you get from having 'match(query, subject)' do
>> 'findOverlaps(query, subject, select="first")' on
>> IRanges/GRanges objects. If you need to do that, just use
the
>> latter, or, if you think that's still too much typing,
define
>> a wrapper e.g. 'ovmatch(query, subject)'.
>>
>> There are plenty of specialized tools around for doing
>> inexact/fuzzy/partial/overlap matching for many particular
types
>> of vector-like objects: grep() and family, pmatch(),
charmatch(),
>> agrep(), grepRaw(), matchPattern() and family,
findOverlaps() and
>> family, findIntervals(), etc... For the reasons I mentioned
>> above, none of them should hijack match() to make it do
some
>> particular type of inexact matching on some particular type
of
>> objects. Even if, for that particular type of objects,
doing that
>> particular type of inexact matching is more common than
doing
>> exact matching.
>>
>> H.
>>
>>
>>
>> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>>
>> I think having overlapsAny is a nice addition and helps
make
>> the API
>> more complete and explicit. Are you sure we need to
change
>> the behavior
>> of the match method for this relatively uncommon use
case?
>>
>>
>> Yes because otherwise users with a use case of doing
match()
>>
>> even if it's uncommon,
>>
>>
>> I don't think
>> "match" always has to mean "equality". It is a more
general
>> concept in
>> my mind. The most common use case for matching ranges
is
>> overlap.
>>
>>
>> Of course "match" doesn't always have to mean equality. But
of
>> base
>>
>>
>> Michael
>>
>>
>> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
>> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
wrote:
>>
>> Yes 'peaks %in% genes' is cute and was probably
doing
>> the right thing
>> for most users (although not all). But 'exons %in%
>> genes' is cute too
>> and was probably doing the wrong thing for all
users.
>> Advanced users
>> like you guys would have no problem switching to
>>
>> !is.na <http: is.na="">
>> <http: is.na="">(findOverlaps(__**peaks, genes,
type="within",
>>
>> select="any"))
>>
>> or
>>
>> !is.na <http: is.na="">
>> <http: is.na="">(findOverlaps(__**peaks, genes,
type="equal",
>>
>>
>> select="any"))
>>
>> in case 'peaks %in% genes' was not doing exactly
what
>> you wanted,
>> but most users would not find this particularly
>> friendly. Even
>> worse, some users probably didn't realize that
'peaks
>> %in% genes'
>> was not doing exactly what they thought it did
because
>> "peaks in
>> genes" in English suggests that the peaks are
within
>> the genes,
>> but it's not what 'peaks %in% genes' does.
>>
>> Having overlapsAny(), with exactly the same extra
>> arguments as
>> countOverlaps() and subsetByOverlaps() (i.e.
'maxgap',
>> 'minoverlap',
>> 'type', 'ignore.strand'), all of them documented
(and
>> with most
>> users more or less familiar with them already) has
the
>> virtue to
>> expose the user to all the options from the very
start,
>> and to
>> help him/her make the right choice. Of course
there
>> will be users
>> that don't want or don't have the time to
read/think
>> about all the
>> options. Not a big deal: they'll just do
>> 'overlapsAny(query, subject)',
>> which is not a lot more typing than 'query %in%
>> subject', especially
>> if they use tab completion.
>>
>> It's true that it's more common to ask questions
about
>> overlap than
>> about equality but there are some use cases for
the
>> latter (as the
>> original thread shows). Until now, when you had
such a
>> use case, you
>> could not use match() or %in%, which would have
been
>> the natural things
>> to use, because they got hijacked to do something
else,
>> and you were
>> left with nothing. Not a satisfying situation. So
at a
>> minimum, we
>> needed to restore the true/real/original semantic
of
>> match() to do
>> "equality" instead of "overlap". But it's hard to
do
>> this for match()
>> and not do it for %in% too. For more than 99% of R
>> users, %in% is
>> just a simple wrapper for 'match(x, table, nomatch
= 0)
>> > 0' (this
>> is how it has been documented and implemented in
base R
>> for many
>> years). Not maintaining this relationship between
%in%
>> and match()
>> would only cause grief and frustration to
newcomers to
>> Bioconductor.
>>
>> H.
>>
>>
>>
>> On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
>>
>> Hiya again,
>>
>> I am definitely a late comer to BioC, so I
>> definitely easily
>> defer to
>> the tide of history.
>>
>> But I do think you miss my point Michael about
the
>> proposed change
>> making the relationship between %in% and match
for
>> {G,I}Ranges{List}
>> mimic that between other vectors, and I do
think
>> that changing
>> the API
>> would make other late-comers take to BioC
>> easier/faster.
>>
>> That said, I NEVER use %in% so I really have
no
>> stake in the
>> matter, and
>> I DEFINITELY appreciate the argument to not
>> changing the API
>> just for
>> sematic sweetness.
>>
>> That that said, Herve is _/so good/_ about
>> deprecations and warnings
>>
>> that make such changes fairly easily
digestible.
>>
>> That that that.... enough.... I bow out of
this
>> one....!!!!
>>
>> Always learning and Happy New Year to all
lurkers,
>>
>> ~Malcolm
>>
>> *From:*Michael Lawrence
>> [mailto:lawrence.michael@gene.
>> <mailto:lawrence.michael@gene.**>____com
>>
>>
>> <mailto:lawrence.michael@gene.**__com>>
<mailto:lawrence.michael@gene.**com<lawrence.michael@gene.com>
>> >>]
>> *Sent:* Friday, January 04, 2013 5:11 PM
>> *To:* Cook, Malcolm
>> *Cc:* Sean Davis; Michael Lawrence; Hervé
Pagès
>> (hpages@fhcrc.org <mailto:hpages@fhcrc.org>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>);
Tim
>>
>>
>>
>> Triche, Jr.; Vedran Franke;
>> bioconductor@r-project.org <mailto:bioconductor@r-**>> project.org <bioconductor@r-project.org>>
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>
>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >>
>> *Subject:* Re: [BioC] countMatches() (was:
table
>> for GenomicRanges)
>>
>>
>> On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm
>> <mec@stowers.org <mailto:mec@stowers.org="">
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>>>
wrote:
>>
>> Hiya,
>>
>> For what it is worth...
>>
>> I think the change to %in% is warranted.
>>
>> If I understand correctly, this change
restores the
>> relationship
>> between
>> the semantics of `%in` and the semantics of
`match`.
>>
>> From the docs:
>>
>> '"%in%" <- function(x, table) match(x,
table,
>> nomatch = 0) > 0'
>>
>> Herve's change restores this relationship.
>>
>>
>> match and %in% were initially consistent (both
>> considering any
>> overlap);
>> Herve has changed both of them together. The
whole
>> idea behind
>> IRanges
>> is that ranges are special data types with
special
>> semantics. We
>> have
>> reimplemented much of the existing R vector
API
>> using those
>> semantics;
>> this extends beyond match/%in%. I am hesitant
about
>> making such
>> sweeping
>> changes to the API so late in the life-cycle
of the
>> package.
>> There was a
>> feature request for a way to count identical
ranges
>> in a set of
>> ranges.
>> Let's please not get carried away and start
>> redesigning the API
>> for this
>> one, albeit useful, request. There are all
sorts of
>> inconsistencies in
>> the API, and many of them were conscious
decisions
>> that considered
>> practical use cases.
>>
>> Michael
>>
>>
>> Herve, I suspect you were you as a result
able
>> to
>> completely drop
>> all the `%in%,BiocClass1,BiocClass2`
>> definitions and depend
>> upon
>> base::%in%
>>
>> Am I right?
>>
>> If so, may I suggest that Herve stay the
>> course, with the
>> addition of
>> '"%ol%" <- function(a, b)
findOverlaps(a,
>> b, maxgap=0L,
>> minoverlap=1L, type='any', select='all')
> 0'
>>
>> This would provide a perspicacious idiom,
>> thereby
>> optimizing the API
>> for Michaels observed common use case.
>>
>> Just sayin'
>>
>> ~Malcolm
>>
>>
>> .-----Original Message-----
>> .From:
>> bioconductor-bounces@r-____**project.org<bioconductor- bounces@r-____project.org="">
>> <mailto:bioconductor-bounces@**r-__project.org <bioconductor-bounces@r-__project.org="">
>> >
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org <bioconductor-bounces@r-project.org="">
>> >>
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>**____r-project.org
>> <http: r-project.org="">
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org <bioconductor-bounces@r-project.org="">
>> >>>
>> [mailto:bioconductor-bounces@
>>
>> <mailto:bioconductor-bounces@>**____r-project.org
>> <http: r-project.org="">
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org <bioconductor-bounces@r-project.org="">
>> >>
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>**____r-project.org
>> <http: r-project.org="">
>>
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org <bioconductor-bounces@r-project.org="">>>>]
>> On Behalf Of Sean
>> Davis
>> .Sent: Friday, January 04, 2013 3:37 PM
>> .To: Michael Lawrence
>> .Cc: Tim Triche, Jr.; Vedran Franke;
>> bioconductor@r-project.org
>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >>
>> <mailto:bioconductor@r-____**project.org< bioconductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
>>
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >>>
>>
>> .Subject: Re: [BioC] countMatches()
(was:
>> table for
>> GenomicRanges)
>> .
>> .On Fri, Jan 4, 2013 at 4:32 PM,
Michael
>> Lawrence
>> .<lawrence.michael@gene.com>>
<mailto:lawrence.michael@gene.**com<lawrence.michael@gene.com>
>> >
>> <mailto:lawrence.michael@gene.**__com>>
<mailto:lawrence.michael@gene.**com<lawrence.michael@gene.com>
>> >>
>> <mailto:lawrence.michael@gene.>> <mailto:lawrence.michael@gene.**>____com
>>
>> <mailto:lawrence.michael@gene.**__com>>
<mailto:lawrence.michael@gene.**com<lawrence.michael@gene.com>>>>>
>> wrote:
>> .> The change to the behavior of %in%
is a
>> pretty big
>> one. Are you
>> thinking
>> .> that all set-based operations should
>> behave this way? For
>> example, setdiff
>> .> and intersect? I really liked the
syntax
>> of "peaks
>> %in% genes".
>> In my
>> .> experience, it's way more common to
ask
>> questions
>> about overlap
>> than about
>> .> equality, so I'd rather optimize the
API
>> for that use
>> case. But
>> again,
>> .> that's just my personal bias.
>> .
>> .For what it is worth, I share
Michael's
>> personal bias here.
>> .
>> .Sean
>> .
>> .
>> .> Michael
>> .>
>> .>
>> .> On Fri, Jan 4, 2013 at 1:11 PM,
Hervé Pagès
>> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>>>> wrote:
>> .>
>> .>> Hi,
>> .>>
>> .>> I added findMatches() and
countMatches()
>> to the
>> latest IRanges /
>> .>> GenomicRanges packages (in BioC
devel
>> only).
>> .>>
>> .>> findMatches(x, table): An
enhanced
>> version of
>> match that
>> .>> returns all the matches
in a
>> Hits object.
>> .>>
>> .>> countMatches(x, table): Returns
an
>> integer vector
>> of the length
>> .>> of x, containing the
number
>> of matches in
>> table for
>> .>> each element in x.
>> .>>
>>
>> .>> countMatches() is what you can use
to
>> tally/count/tabulate
>> (choose your
>>
>> .>> preferred term) the unique elements
in a
>> GRanges object:
>> .>>
>> .>> library(GenomicRanges)
>> .>> set.seed(33)
>> .>> gr <- GRanges("chr1",
>> IRanges(sample(15,20,replace=***____*TRUE),
>>
>>
>> width=5))
>> .>>
>> .>> Then:
>> .>>
>> .>> > gr_levels <- sort(unique(gr))
>> .>> > countMatches(gr_levels, gr)
>> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
>> .>>
>> .>> Note that findMatches() and
>> countMatches() also work on
>> IRanges and
>> .>> DNAStringSet objects, as well as on
>> ordinary atomic
>> vectors:
>> .>>
>> .>> library(hgu95av2probe)
>> .>> library(Biostrings)
>> .>> probes <-
DNAStringSet(hgu95av2probe)
>> .>> unique_probes <- unique(probes)
>> .>> count <-
countMatches(unique_probes,
>> probes)
>> .>> max(count) # 7
>> .>>
>> .>> I made other changes in
>> IRanges/GenomicRanges so that
>> the notion
>> .>> of "match" between elements of a
>> vector-like object now
>> consistently
>> .>> means "equality" instead of
"overlap",
>> even for
>> range-based
>> objects
>> .>> like IRanges or GRanges objects.
This
>> notion of
>> "equality" is the
>> .>> same that is used by ==. The most
>> visible consequence
>> of those
>> .>> changes is that using %in% between
2
>> IRanges or
>> GRanges objects
>> .>> 'query' and 'subject' in order to
do
>> overlaps was
>> replaced by
>> .>> overlapsAny(query, subject).
>> .>>
>> .>> overlapsAny(query, subject):
Finds the
>> ranges in
>> query that
>> .>> overlap any of the ranges in
>> subject.
>> .>>
>>
>> .>> There are warnings and deprecation
>> messages in place
>> to help
>> smooth
>>
>> .>> the transition.
>> .>>
>> .>> Cheers,
>> .>> H.
>> .>>
>> .>> --
>> .>> Hervé Pagès
>> .>>
>> .>> Program in Computational Biology
>> .>> Division of Public Health Sciences
>> .>> Fred Hutchinson Cancer Research
Center
>> .>> 1100 Fairview Ave. N, M1-B514
>> .>> P.O. Box 19024
>> .>> Seattle, WA 98109-1024
>> .>>
>> .>> E-mail: hpages@fhcrc.org
>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
>>
>> .>> Phone: (206) 667-5791
>> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
>> <tel:%28206%29%20667-5791>
>> .>> Fax: (206) 667-1319
>> <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319>
>> <tel:%28206%29%20667-1319>
>>
>> .>>
>> .>
>> .> [[alternative HTML version
>> deleted]]
>> .>
>> .>
>> .>
>> ______________________________**_____________________
>>
>>
>> .> Bioconductor mailing list
>> .> Bioconductor@r-project.org
>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >>
>> <mailto:bioconductor@r-____**project.org<bioco nductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >>>
>>
>> .>
>> https://stat.ethz.ch/mailman/_**___listinfo/bioconducto
r<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor="" <https:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> >
>>
>>
>>
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor="" <https:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<h="" ttps:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>> >>
>> .> Search the archives:
>> http://news.gmane.org/gmane.__**
>> __science.biology.informatics.**____conductor<http: news.gmane.org="" gmane.____science.biology.informatics.____conductor="">
>> <http: news.gmane.org="" gmane._**="">> _science.biology.informatics._**_conductor<http: news.gmane.org="" gm="" ane.__science.biology.informatics.__conductor="">
>> >
>>
>>
>> <http: news.gmane.org="" gmane._**="">> _science.biology.informatics._**_conductor<http: news.gmane.org="" gm="" ane.__science.biology.informatics.__conductor="">
>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor="">
>> >>
>> .
>>
>>
._____________________________**______________________
>>
>>
>> .Bioconductor mailing list
>> .Bioconductor@r-project.org
>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >>
>> <mailto:bioconductor@r-____**project.org<bioco nductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >>>
>>
>>
>> .https://stat.ethz.ch/mailman/**____listinfo/biocondu
ctor<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor="" <https:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> >
>>
>>
>>
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor="" <https:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<h="" ttps:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>> >>
>> .Search the archives:
>> http://news.gmane.org/gmane.__**
>> __science.biology.informatics.**____conductor<http: news.gmane.org="" gmane.____science.biology.informatics.____conductor="">
>> <http: news.gmane.org="" gmane._**="">> _science.biology.informatics._**_conductor<http: news.gmane.org="" gm="" ane.__science.biology.informatics.__conductor="">
>> >
>>
>>
>>
>> <http: news.gmane.org="" gmane._**="">> _science.biology.informatics._**_conductor<http: news.gmane.org="" gm="" ane.__science.biology.informatics.__conductor="">
>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor="">
>> >>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>>
>>
>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>> <tel:%28206%29%20667-5791>
>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>> <tel:%28206%29%20667-1319>
>>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org>
>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>
>>
>>
>>
>>
>> --
>> /A model is a lie that helps you see the truth./
>> /
>> /
>> Howard Skipper
>> <http: cancerres.**aacrjournals.org="" content="" 31="" 9="" **1173.full.pdf<h="" ttp:="" cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
>> >
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages@fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
Thanks Tim, Malcolm for the feedback.
@Tim, I won't comment on the variants of %ov% you are proposing for
doing "within" or "equal" instead of "any" (but if people want them,
I'll add them too). For now I just want to focus on restoring the
convenience of the old %in%, whose removal is understandably causing
some frustration. And so we can move on.
Cheers,
H.
On 01/08/2013 09:50 AM, Tim Triche, Jr. wrote:
> hell, I'll add the operators if there's support for them. obviously
> they're not a big deal and a patch would take 5 minutes flat.
>
> my hope was to be very explicit about what each type of operation
meant,
> so that when a newcomer to the Ranges API sees
>
> peaks %overlapping% promoters(someGroupOfGenesWeCareAbout)
>
> it cannot be confused with
>
> peaks %within% rangesThatCorrespondToSomeChromatinState
>
> or
>
> peaks %equal% aBunchOfDNAseFootprints
>
> or
>
> DMRs %in% genes ## what the hell does this really mean, anyways?
> it's so bad on so many levels
>
> because whenever someone says "what is the advantage of Ranges-based
> analyses?", these are the archetypal sorts of queries that come to
mind.
> Except that usually in my examples they are based on posterior
> probabilities, but perhaps that could stand to change.
>
> Anyways, that's just my bias, and you're doing the heavy lifting.
But
> if people agree with the motivations I will write the patch today.
>
> Cheers,
>
> --t
>
>
>
>
> On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote:
>
> Hi Tim,
>
> I could add the %ov% operator as a replacement for the old %in%.
So you
> would write 'peaks %ov% genes' instead of 'peaks %in% genes'.
Would just
> be a convenience wrapper for 'overlapsAny(peaks, genes)'.
>
> Cheers,
> H.
>
>
> On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
>
> So why not leave %in% as it was and transition everything
forward to
> explicitly using { `%within%`,
`%overlaps%`|`%overlapping%`,
> `%equals%`
> } such that
>
> identical( x %within% table, countOverlaps(x, table,
> type='within') >
> 0 ) == TRUE
> identical( x %overlaps% table, countOverlaps(x, table,
> type='any') >
> 0 ) == TRUE
> identical( x %equals% table, countOverlaps(x, table,
> type='equal') >
> 0 ) == TRUE
>
> and for the time being,
>
> identical( x %overlaps% table, countOverlaps(x, table,
> type='any') >
> 0 ) == TRUE ## but with a noisy nastygram that will halt if
> options("warn"=2)
> No breakage for %in% methods until such time as a full
> deprecation cycle
> has passed, and if the maintainers can't be arsed to do
anything
> at all
> about the warnings by the second full release, then perhaps
they
> don't
> really care that much after all. Just a thought?
>
> From someone (me) who has their own issues with keeping
> everything up
> to date and should know better. If you want to use %in% for
>
> peaks %in% genes (why on earth would you do this rather
than
> peaks
> %in% promoters(genes), anyways?)
>
> then a nastygram could be emitted "WARNING: YOUR SHORTHAND
> NOTATION IS
> DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and
everyone is
> (more
> or less) happy.
>
>
>
> On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
> <lawrence.michael at="" gene.com="" <mailto:lawrence.michael="" at="" gene.com="">
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>> wrote:
>
>
>
>
> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès
> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>> wrote:
>
> Hi Michael,
>
> I don't think "match" (the word) always has to mean
> "equality"
> either.
> However having match() (the function) do "whole
exact
> matching" (aka
> "equality") for any kind of vector-like object has
the
> advantage of:
>
> (a) making it consistent with base::match()
> (?base::match is
> pretty
> explicit about what the contract of match()
is)
>
>
> (a) alone is obviously not enough. We have many
methods,
> like the
> set operations, that treat ranges specially. Are we
going
> to start
> moving everything toward the base behavior? And have
> rangeIntersect,
> rangeSetdiff, etc?
>
> (b) preserving its relationship with ==,
> duplicated(), unique(),
> etc...
>
>
> So it becomes consistent with duplicated/unique, but we
lose
> consistency with the set operations.
>
> (c) not frustrating the user who needs something
to
> do exact
> matching on ranges (as I mentioned
previously,
> if you take
> match() away from him/her, s/he'll be left
with
> nothing).
>
>
> No one has ever asked for match() to behave this way.
There
> was a
> request for a way to tabulate identical ranges. It was
a
> nice idea
> to extract the general "outer equal" findMatches
function.
> But the
> changes seem to be snow-balling. These types of
changes
> mean a lot
> of maintenance work for the users. A deprecation cycle
does not
> circumvent that.
>
>
> IMO those advantages counterbalance *by far* the
very
> little
> convenience you get from having 'match(query,
subject)' do
> 'findOverlaps(query, subject, select="first")' on
> IRanges/GRanges objects. If you need to do that,
just
> use the
> latter, or, if you think that's still too much
typing,
> define
> a wrapper e.g. 'ovmatch(query, subject)'.
>
> There are plenty of specialized tools around for
doing
> inexact/fuzzy/partial/overlap matching for many
> particular types
> of vector-like objects: grep() and family,
pmatch(),
> charmatch(),
> agrep(), grepRaw(), matchPattern() and family,
> findOverlaps() and
> family, findIntervals(), etc... For the reasons I
mentioned
> above, none of them should hijack match() to make
it do
> some
> particular type of inexact matching on some
particular
> type of
> objects. Even if, for that particular type of
objects,
> doing that
> particular type of inexact matching is more common
than
> doing
> exact matching.
>
> H.
>
>
>
> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>
> I think having overlapsAny is a nice addition
and
> helps make
> the API
> more complete and explicit. Are you sure we
need to
> change
> the behavior
> of the match method for this relatively
uncommon
> use case?
>
>
> Yes because otherwise users with a use case of
doing
> match()
>
> even if it's uncommon,
>
>
> I don't think
> "match" always has to mean "equality". It is a
more
> general
> concept in
> my mind. The most common use case for matching
> ranges is
> overlap.
>
>
> Of course "match" doesn't always have to mean
equality.
> But of base
>
>
> Michael
>
>
> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>
wrote:
>
> Yes 'peaks %in% genes' is cute and was
> probably doing
> the right thing
> for most users (although not all). But
'exons %in%
> genes' is cute too
> and was probably doing the wrong thing
for
> all users.
> Advanced users
> like you guys would have no problem
switching to
>
> !is.na <http: is.na=""> <http: is.na="">
> <http: is.na="">(findOverlaps(____peaks, genes,
> type="within",
>
> select="any"))
>
> or
>
> !is.na <http: is.na=""> <http: is.na="">
> <http: is.na="">(findOverlaps(____peaks, genes,
> type="equal",
>
>
> select="any"))
>
> in case 'peaks %in% genes' was not doing
> exactly what
> you wanted,
> but most users would not find this
particularly
> friendly. Even
> worse, some users probably didn't realize
that
> 'peaks
> %in% genes'
> was not doing exactly what they thought it
did
> because
> "peaks in
> genes" in English suggests that the peaks
are
> within
> the genes,
> but it's not what 'peaks %in% genes' does.
>
> Having overlapsAny(), with exactly the
same extra
> arguments as
> countOverlaps() and subsetByOverlaps()
(i.e.
> 'maxgap',
> 'minoverlap',
> 'type', 'ignore.strand'), all of them
> documented (and
> with most
> users more or less familiar with them
already)
> has the
> virtue to
> expose the user to all the options from
the
> very start,
> and to
> help him/her make the right choice. Of
course
> there
> will be users
> that don't want or don't have the time to
> read/think
> about all the
> options. Not a big deal: they'll just do
> 'overlapsAny(query, subject)',
> which is not a lot more typing than 'query
%in%
> subject', especially
> if they use tab completion.
>
> It's true that it's more common to ask
> questions about
> overlap than
> about equality but there are some use
cases
> for the
> latter (as the
> original thread shows). Until now, when
you
> had such a
> use case, you
> could not use match() or %in%, which would
> have been
> the natural things
> to use, because they got hijacked to do
> something else,
> and you were
> left with nothing. Not a satisfying
situation.
> So at a
> minimum, we
> needed to restore the true/real/original
> semantic of
> match() to do
> "equality" instead of "overlap". But it's
hard
> to do
> this for match()
> and not do it for %in% too. For more than
99% of R
> users, %in% is
> just a simple wrapper for 'match(x, table,
> nomatch = 0)
> > 0' (this
> is how it has been documented and
implemented
> in base R
> for many
> years). Not maintaining this relationship
> between %in%
> and match()
> would only cause grief and frustration to
> newcomers to
> Bioconductor.
>
> H.
>
>
>
> On 01/04/2013 03:32 PM, Cook, Malcolm
wrote:
>
> Hiya again,
>
> I am definitely a late comer to BioC,
so I
> definitely easily
> defer to
> the tide of history.
>
> But I do think you miss my point
Michael
> about the
> proposed change
> making the relationship between %in%
and
> match for
> {G,I}Ranges{List}
> mimic that between other vectors, and
I do
> think
> that changing
> the API
> would make other late-comers take to
BioC
> easier/faster.
>
> That said, I NEVER use %in% so I
really
> have no
> stake in the
> matter, and
> I DEFINITELY appreciate the argument
to not
> changing the API
> just for
> sematic sweetness.
>
> That that said, Herve is _/so good/_
about
> deprecations and warnings
>
> that make such changes fairly easily
> digestible.
>
> That that that.... enough.... I bow
out of
> this
> one....!!!!
>
> Always learning and Happy New Year to
all
> lurkers,
>
> ~Malcolm
>
> *From:*Michael Lawrence
> [mailto:lawrence.michael at gene
> <mailto:lawrence.michael at="" gene="">.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.__>____com
>
>
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">____com
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>>]
> *Sent:* Friday, January 04, 2013 5:11
PM
> *To:* Cook, Malcolm
> *Cc:* Sean Davis; Michael Lawrence;
Herv?
> Pag?s
> (hpages at fhcrc.org
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>);
Tim
>
>
>
> Triche, Jr.; Vedran Franke;
> bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
> *Subject:* Re: [BioC] countMatches()
(was:
> table
> for GenomicRanges)
>
>
> On Fri, Jan 4, 2013 at 1:56 PM, Cook,
Malcolm
> <mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>
> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">>>
> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">>
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>>>
wrote:
>
> Hiya,
>
> For what it is worth...
>
> I think the change to %in% is
warranted.
>
> If I understand correctly, this change
> restores the
> relationship
> between
> the semantics of `%in` and the
semantics
> of `match`.
>
> From the docs:
>
> '"%in%" <- function(x, table)
match(x,
> table,
> nomatch = 0) > 0'
>
> Herve's change restores this
relationship.
>
>
> match and %in% were initially
consistent (both
> considering any
> overlap);
> Herve has changed both of them
together.
> The whole
> idea behind
> IRanges
> is that ranges are special data types
with
> special
> semantics. We
> have
> reimplemented much of the existing R
> vector API
> using those
> semantics;
> this extends beyond match/%in%. I am
> hesitant about
> making such
> sweeping
> changes to the API so late in the
> life-cycle of the
> package.
> There was a
> feature request for a way to count
> identical ranges
> in a set of
> ranges.
> Let's please not get carried away and
start
> redesigning the API
> for this
> one, albeit useful, request. There are
all
> sorts of
> inconsistencies in
> the API, and many of them were
conscious
> decisions
> that considered
> practical use cases.
>
> Michael
>
>
> Herve, I suspect you were you as
a
> result able to
> completely drop
> all the
`%in%,BiocClass1,BiocClass2`
> definitions and depend
> upon
> base::%in%
>
> Am I right?
>
> If so, may I suggest that Herve
stay the
> course, with the
> addition of
> '"%ol%" <- function(a, b)
> findOverlaps(a,
> b, maxgap=0L,
> minoverlap=1L, type='any',
> select='all') > 0'
>
> This would provide a
perspicacious
> idiom, thereby
> optimizing the API
> for Michaels observed common use
case.
>
> Just sayin'
>
> ~Malcolm
>
>
> .-----Original Message-----
> .From:
> bioconductor-bounces at r-______project.org
> <mailto:bioconductor-bounces at="" r-____project.org="">
> <mailto:bioconductor-bounces at="" __r-="" __project.org=""> <mailto:bioconductor-bounces at="" r-__project.org="">>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>>
> [mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>>] On Behalf
Of Sean
> Davis
> .Sent: Friday, January 04, 2013
3:37 PM
> .To: Michael Lawrence
> .Cc: Tim Triche, Jr.; Vedran
Franke;
> bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
>
> <mailto:bioconductor at="" r-______project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">>
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>>
>
> .Subject: Re: [BioC]
countMatches()
> (was:
> table for
> GenomicRanges)
> .
> .On Fri, Jan 4, 2013 at 4:32
PM,
> Michael
> Lawrence
> .<lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">____com
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>>
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.__>____com
>
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">____com
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>>>> wrote:
> .> The change to the behavior
of
> %in% is a
> pretty big
> one. Are you
> thinking
> .> that all set-based
operations should
> behave this way? For
> example, setdiff
> .> and intersect? I really
liked
> the syntax
> of "peaks
> %in% genes".
> In my
> .> experience, it's way more
common
> to ask
> questions
> about overlap
> than about
> .> equality, so I'd rather
optimize
> the API
> for that use
> case. But
> again,
> .> that's just my personal
bias.
> .
> .For what it is worth, I share
> Michael's
> personal bias here.
> .
> .Sean
> .
> .
> .> Michael
> .>
> .>
> .> On Fri, Jan 4, 2013 at 1:11
PM,
> Hervé Pagès
> <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>>>> wrote:
> .>
> .>> Hi,
> .>>
> .>> I added findMatches() and
> countMatches()
> to the
> latest IRanges /
> .>> GenomicRanges packages (in
BioC
> devel only).
> .>>
> .>> findMatches(x, table): An
> enhanced
> version of
> ?match? that
> .>> returns all the
> matches in a
> Hits object.
> .>>
> .>> countMatches(x, table):
> Returns an
> integer vector
> of the length
> .>> of ?x?,
containing
> the number
> of matches in
> ?table? for
> .>> each element in
?x?.
> .>>
>
> .>> countMatches() is what you
can
> use to
> tally/count/tabulate
> (choose your
>
> .>> preferred term) the unique
> elements in a
> GRanges object:
> .>>
> .>> library(GenomicRanges)
> .>> set.seed(33)
> .>> gr <- GRanges("chr1",
>
IRanges(sample(15,20,replace=*______*TRUE),
>
>
> width=5))
> .>>
> .>> Then:
> .>>
> .>> > gr_levels <-
sort(unique(gr))
> .>> > countMatches(gr_levels,
gr)
> .>> [1] 1 1 1 2 4 2 2 1 2 2
2
> .>>
> .>> Note that findMatches() and
> countMatches() also work on
> IRanges and
> .>> DNAStringSet objects, as
well as on
> ordinary atomic
> vectors:
> .>>
> .>> library(hgu95av2probe)
> .>> library(Biostrings)
> .>> probes <-
> DNAStringSet(hgu95av2probe)
> .>> unique_probes <-
unique(probes)
> .>> count <-
> countMatches(unique_probes,
> probes)
> .>> max(count) # 7
> .>>
> .>> I made other changes in
> IRanges/GenomicRanges so that
> the notion
> .>> of "match" between elements
of a
> vector-like object now
> consistently
> .>> means "equality" instead of
> "overlap",
> even for
> range-based
> objects
> .>> like IRanges or GRanges
> objects. This
> notion of
> "equality" is the
> .>> same that is used by ==.
The most
> visible consequence
> of those
> .>> changes is that using %in%
> between 2
> IRanges or
> GRanges objects
> .>> 'query' and 'subject' in
order
> to do
> overlaps was
> replaced by
> .>> overlapsAny(query,
subject).
> .>>
> .>> overlapsAny(query,
subject):
> Finds the
> ranges in
> ?query? that
> .>> overlap any of the
ranges
> in ?subject?.
> .>>
>
> .>> There are warnings and
deprecation
> messages in place
> to help
> smooth
>
> .>> the transition.
> .>>
> .>> Cheers,
> .>> H.
> .>>
> .>> --
> .>> Hervé Pagès
> .>>
> .>> Program in Computational
Biology
> .>> Division of Public Health
Sciences
> .>> Fred Hutchinson Cancer
Research
> Center
> .>> 1100 Fairview Ave. N,
M1-B514
> .>> P.O. Box 19024
> .>> Seattle, WA 98109-1024
> .>>
> .>> E-mail: hpages at fhcrc.org
> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>
>
> .>> Phone: (206) 667-5791
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> .>> Fax: (206) 667-1319
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
> .>>
> .>
> .> [[alternative HTML
> version deleted]]
> .>
> .>
> .>
>
_____________________________________________________
>
>
> .> Bioconductor mailing list
> .> Bioconductor at
r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
> <mailto:bioconductor at="" r-______project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>>
>
> .>
> https://stat.ethz.ch/mailman/______listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
>
>
>
>
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
> .> Search the archives:
> http://news.gmane.org/gmane.______science.biology.informatic
s.______conductor
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">>
>
>
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">
>
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>>
> .
>
>
> ._____________________________________________________
>
>
> .Bioconductor mailing list
> .Bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
> <mailto:bioconductor at="" r-______project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>>
>
>
>
> .https://stat.ethz.ch/mailman/______listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
>
>
>
>
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
> .Search the archives:
> http://news.gmane.org/gmane.______science.biology.informatic
s.______conductor
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">>
>
>
>
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">
>
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
>
>
> Phone: (206) 667-5791
> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319
<tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
>
>
> --
> /A model is a lie that helps you see the truth./
> /
> /
> Howard Skipper
>
<http: cancerres.__aacrjournals.org="" content="" 31="" 9="" __1173.full.pdf="" <http:="" cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
>
>
> --
> /A model is a lie that helps you see the truth./
> /
> /
> Howard Skipper
> <http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
I would vote for %over% instead of %ov%. Just 2 more characters but
way
clearer, at least to me. The hardest thing to type are the %'s.
Michael
On Tue, Jan 8, 2013 at 11:09 AM, Hervé Pagès <hpages@fhcrc.org> wrote:
> Thanks Tim, Malcolm for the feedback.
>
> @Tim, I won't comment on the variants of %ov% you are proposing for
> doing "within" or "equal" instead of "any" (but if people want them,
> I'll add them too). For now I just want to focus on restoring the
> convenience of the old %in%, whose removal is understandably causing
> some frustration. And so we can move on.
>
> Cheers,
> H.
>
>
>
> On 01/08/2013 09:50 AM, Tim Triche, Jr. wrote:
>
>> hell, I'll add the operators if there's support for them.
obviously
>> they're not a big deal and a patch would take 5 minutes flat.
>>
>> my hope was to be very explicit about what each type of operation
meant,
>> so that when a newcomer to the Ranges API sees
>>
>> peaks %overlapping% promoters(**someGroupOfGenesWeCareAbout)
>>
>> it cannot be confused with
>>
>> peaks %within% rangesThatCorrespondToSomeChro**matinState
>>
>> or
>>
>> peaks %equal% aBunchOfDNAseFootprints
>>
>> or
>>
>> DMRs %in% genes ## what the hell does this really mean,
anyways?
>> it's so bad on so many levels
>>
>> because whenever someone says "what is the advantage of Ranges-
based
>> analyses?", these are the archetypal sorts of queries that come to
mind.
>> Except that usually in my examples they are based on posterior
>> probabilities, but perhaps that could stand to change.
>>
>> Anyways, that's just my bias, and you're doing the heavy lifting.
But
>> if people agree with the motivations I will write the patch today.
>>
>> Cheers,
>>
>> --t
>>
>>
>>
>>
>> On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès <hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>> wrote:
>>
>> Hi Tim,
>>
>> I could add the %ov% operator as a replacement for the old
%in%. So
>> you
>> would write 'peaks %ov% genes' instead of 'peaks %in% genes'.
Would
>> just
>> be a convenience wrapper for 'overlapsAny(peaks, genes)'.
>>
>> Cheers,
>> H.
>>
>>
>> On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
>>
>> So why not leave %in% as it was and transition everything
forward
>> to
>> explicitly using { `%within%`,
`%overlaps%`|`%overlapping%`,
>> `%equals%`
>> } such that
>>
>> identical( x %within% table, countOverlaps(x, table,
>> type='within') >
>> 0 ) == TRUE
>> identical( x %overlaps% table, countOverlaps(x, table,
>> type='any') >
>> 0 ) == TRUE
>> identical( x %equals% table, countOverlaps(x, table,
>> type='equal') >
>> 0 ) == TRUE
>>
>> and for the time being,
>>
>> identical( x %overlaps% table, countOverlaps(x, table,
>> type='any') >
>> 0 ) == TRUE ## but with a noisy nastygram that will halt if
>> options("warn"=2)
>> No breakage for %in% methods until such time as a full
>> deprecation cycle
>> has passed, and if the maintainers can't be arsed to do
anything
>> at all
>> about the warnings by the second full release, then perhaps
they
>> don't
>> really care that much after all. Just a thought?
>>
>> From someone (me) who has their own issues with keeping
>> everything up
>> to date and should know better. If you want to use %in%
for
>>
>> peaks %in% genes (why on earth would you do this rather
than
>> peaks
>> %in% promoters(genes), anyways?)
>>
>> then a nastygram could be emitted "WARNING: YOUR SHORTHAND
>> NOTATION IS
>> DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and
everyone is
>> (more
>> or less) happy.
>>
>>
>>
>> On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
>> <lawrence.michael@gene.com <mailto:lawrence.michael@gene.**com<lawrence.michael@gene.com="">
>> >
>> <mailto:lawrence.michael@gene.**__com>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>>>
>> wrote:
>>
>>
>>
>>
>> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès
>> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
wrote:
>>
>> Hi Michael,
>>
>> I don't think "match" (the word) always has to
mean
>> "equality"
>> either.
>> However having match() (the function) do "whole
exact
>> matching" (aka
>> "equality") for any kind of vector-like object has
the
>> advantage of:
>>
>> (a) making it consistent with base::match()
>> (?base::match is
>> pretty
>> explicit about what the contract of match()
is)
>>
>>
>> (a) alone is obviously not enough. We have many
methods,
>> like the
>> set operations, that treat ranges specially. Are we
going
>> to start
>> moving everything toward the base behavior? And have
>> rangeIntersect,
>> rangeSetdiff, etc?
>>
>> (b) preserving its relationship with ==,
>> duplicated(), unique(),
>> etc...
>>
>>
>> So it becomes consistent with duplicated/unique, but
we lose
>> consistency with the set operations.
>>
>> (c) not frustrating the user who needs
something to
>> do exact
>> matching on ranges (as I mentioned
previously,
>> if you take
>> match() away from him/her, s/he'll be left
with
>> nothing).
>>
>>
>> No one has ever asked for match() to behave this way.
There
>> was a
>> request for a way to tabulate identical ranges. It was
a
>> nice idea
>> to extract the general "outer equal" findMatches
function.
>> But the
>> changes seem to be snow-balling. These types of
changes
>> mean a lot
>> of maintenance work for the users. A deprecation cycle
does
>> not
>> circumvent that.
>>
>>
>> IMO those advantages counterbalance *by far* the
very
>> little
>> convenience you get from having 'match(query,
subject)'
>> do
>> 'findOverlaps(query, subject, select="first")' on
>> IRanges/GRanges objects. If you need to do that,
just
>> use the
>> latter, or, if you think that's still too much
typing,
>> define
>> a wrapper e.g. 'ovmatch(query, subject)'.
>>
>> There are plenty of specialized tools around for
doing
>> inexact/fuzzy/partial/overlap matching for many
>> particular types
>> of vector-like objects: grep() and family,
pmatch(),
>> charmatch(),
>> agrep(), grepRaw(), matchPattern() and family,
>> findOverlaps() and
>> family, findIntervals(), etc... For the reasons I
>> mentioned
>> above, none of them should hijack match() to make
it do
>> some
>> particular type of inexact matching on some
particular
>> type of
>> objects. Even if, for that particular type of
objects,
>> doing that
>> particular type of inexact matching is more common
than
>> doing
>> exact matching.
>>
>> H.
>>
>>
>>
>> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>>
>> I think having overlapsAny is a nice addition
and
>> helps make
>> the API
>> more complete and explicit. Are you sure we
need to
>> change
>> the behavior
>> of the match method for this relatively
uncommon
>> use case?
>>
>>
>> Yes because otherwise users with a use case of
doing
>> match()
>>
>> even if it's uncommon,
>>
>>
>> I don't think
>> "match" always has to mean "equality". It is a
more
>> general
>> concept in
>> my mind. The most common use case for matching
>> ranges is
>> overlap.
>>
>>
>> Of course "match" doesn't always have to mean
equality.
>> But of base
>>
>>
>> Michael
>>
>>
>> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
>> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>>
wrote:
>>
>> Yes 'peaks %in% genes' is cute and was
>> probably doing
>> the right thing
>> for most users (although not all). But
'exons
>> %in%
>> genes' is cute too
>> and was probably doing the wrong thing
for
>> all users.
>> Advanced users
>> like you guys would have no problem
switching to
>>
>> !is.na <http: is.na=""> <http: is.na="">
>> <http: is.na="">(findOverlaps(__**__peaks,
genes,
>>
>> type="within",
>>
>> select="any"))
>>
>> or
>>
>> !is.na <http: is.na=""> <http: is.na="">
>> <http: is.na="">(findOverlaps(__**__peaks,
genes,
>>
>> type="equal",
>>
>>
>> select="any"))
>>
>> in case 'peaks %in% genes' was not doing
>> exactly what
>> you wanted,
>> but most users would not find this
particularly
>> friendly. Even
>> worse, some users probably didn't realize
that
>> 'peaks
>> %in% genes'
>> was not doing exactly what they thought
it did
>> because
>> "peaks in
>> genes" in English suggests that the peaks
are
>> within
>> the genes,
>> but it's not what 'peaks %in% genes'
does.
>>
>> Having overlapsAny(), with exactly the
same
>> extra
>> arguments as
>> countOverlaps() and subsetByOverlaps()
(i.e.
>> 'maxgap',
>> 'minoverlap',
>> 'type', 'ignore.strand'), all of them
>> documented (and
>> with most
>> users more or less familiar with them
already)
>> has the
>> virtue to
>> expose the user to all the options from
the
>> very start,
>> and to
>> help him/her make the right choice. Of
course
>> there
>> will be users
>> that don't want or don't have the time to
>> read/think
>> about all the
>> options. Not a big deal: they'll just do
>> 'overlapsAny(query, subject)',
>> which is not a lot more typing than
'query %in%
>> subject', especially
>> if they use tab completion.
>>
>> It's true that it's more common to ask
>> questions about
>> overlap than
>> about equality but there are some use
cases
>> for the
>> latter (as the
>> original thread shows). Until now, when
you
>> had such a
>> use case, you
>> could not use match() or %in%, which
would
>> have been
>> the natural things
>> to use, because they got hijacked to do
>> something else,
>> and you were
>> left with nothing. Not a satisfying
situation.
>> So at a
>> minimum, we
>> needed to restore the true/real/original
>> semantic of
>> match() to do
>> "equality" instead of "overlap". But it's
hard
>> to do
>> this for match()
>> and not do it for %in% too. For more than
99%
>> of R
>> users, %in% is
>> just a simple wrapper for 'match(x,
table,
>> nomatch = 0)
>> > 0' (this
>> is how it has been documented and
implemented
>> in base R
>> for many
>> years). Not maintaining this relationship
>> between %in%
>> and match()
>> would only cause grief and frustration to
>> newcomers to
>> Bioconductor.
>>
>> H.
>>
>>
>>
>> On 01/04/2013 03:32 PM, Cook, Malcolm
wrote:
>>
>> Hiya again,
>>
>> I am definitely a late comer to BioC,
so I
>> definitely easily
>> defer to
>> the tide of history.
>>
>> But I do think you miss my point
Michael
>> about the
>> proposed change
>> making the relationship between %in%
and
>> match for
>> {G,I}Ranges{List}
>> mimic that between other vectors, and
I do
>> think
>> that changing
>> the API
>> would make other late-comers take to
BioC
>> easier/faster.
>>
>> That said, I NEVER use %in% so I
really
>> have no
>> stake in the
>> matter, and
>> I DEFINITELY appreciate the argument
to not
>> changing the API
>> just for
>> sematic sweetness.
>>
>> That that said, Herve is _/so good/_
about
>> deprecations and warnings
>>
>> that make such changes fairly easily
>> digestible.
>>
>> That that that.... enough.... I bow
out of
>> this
>> one....!!!!
>>
>> Always learning and Happy New Year to
all
>> lurkers,
>>
>> ~Malcolm
>>
>> *From:*Michael Lawrence
>> [mailto:lawrence.michael@gene
>> <mailto:lawrence.michael@gene>**.
>> <mailto:lawrence.michael@gene>> <mailto:lawrence.michael@gene>**.__>____com
>>
>>
>> <mailto:lawrence.michael@gene.>> <mailto:lawrence.michael@gene.**>____com
>> <mailto:lawrence.michael@gene.**__com>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">
>> >>>]
>> *Sent:* Friday, January 04, 2013 5:11
PM
>> *To:* Cook, Malcolm
>> *Cc:* Sean Davis; Michael Lawrence;
Hervé
>> Pagès
>> (hpages@fhcrc.org
>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>); Tim
>>
>>
>>
>> Triche, Jr.; Vedran Franke;
>> bioconductor@r-project.org
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>
>> <mailto:bioconductor@r-____**project. org<bioconductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>>
>> *Subject:* Re: [BioC] countMatches()
(was:
>> table
>> for GenomicRanges)
>>
>>
>> On Fri, Jan 4, 2013 at 1:56 PM, Cook,
>> Malcolm
>> <mec@stowers.org <mailto:mec@stowers.org="">
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>
>> <mailto:mec@stowers.org>> <mailto:mec@stowers.org> <mailto:mec@stowers.org>> <mailto:mec@stowers.org>>>
>> <mailto:mec@stowers.org>> <mailto:mec@stowers.org> <mailto:mec@stowers.org>> <mailto:mec@stowers.org>>
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>>>> wrote:
>>
>> Hiya,
>>
>> For what it is worth...
>>
>> I think the change to %in% is
warranted.
>>
>> If I understand correctly, this
change
>> restores the
>> relationship
>> between
>> the semantics of `%in` and the
semantics
>> of `match`.
>>
>> From the docs:
>>
>> '"%in%" <- function(x, table)
match(x,
>> table,
>> nomatch = 0) > 0'
>>
>> Herve's change restores this
relationship.
>>
>>
>> match and %in% were initially
consistent
>> (both
>> considering any
>> overlap);
>> Herve has changed both of them
together.
>> The whole
>> idea behind
>> IRanges
>> is that ranges are special data types
with
>> special
>> semantics. We
>> have
>> reimplemented much of the existing R
>> vector API
>> using those
>> semantics;
>> this extends beyond match/%in%. I am
>> hesitant about
>> making such
>> sweeping
>> changes to the API so late in the
>> life-cycle of the
>> package.
>> There was a
>> feature request for a way to count
>> identical ranges
>> in a set of
>> ranges.
>> Let's please not get carried away and
start
>> redesigning the API
>> for this
>> one, albeit useful, request. There
are all
>> sorts of
>> inconsistencies in
>> the API, and many of them were
conscious
>> decisions
>> that considered
>> practical use cases.
>>
>> Michael
>>
>>
>> Herve, I suspect you were you as
a
>> result able to
>> completely drop
>> all the
`%in%,BiocClass1,BiocClass2`
>> definitions and depend
>> upon
>> base::%in%
>>
>> Am I right?
>>
>> If so, may I suggest that Herve
stay
>> the
>> course, with the
>> addition of
>> '"%ol%" <- function(a, b)
>> findOverlaps(a,
>> b, maxgap=0L,
>> minoverlap=1L, type='any',
>> select='all') > 0'
>>
>> This would provide a
perspicacious
>> idiom, thereby
>> optimizing the API
>> for Michaels observed common use
case.
>>
>> Just sayin'
>>
>> ~Malcolm
>>
>>
>> .-----Original Message-----
>> .From:
>> bioconductor-bounces@r-______**project.org<bioconductor- bounces@r-______project.org="">
>> <mailto:bioconductor-bounces@**r-____project.org <bioconductor-bounces@r-____project.org="">
>> >
>> <mailto:bioconductor-bounces@_**_r- __project.org="">> <mailto:bioconductor-bounces@**r-__project.org <bioconductor-bounces@r-__project.org="">
>> >>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>**____r-project.org
>> <http: r-project.org="">
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>> >>>
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>**>______r-project.org
>> <http: r-project.org="">
>> <http: r-project.org="">
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>**____r-project.org
>> <http: r-project.org="">
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>> >>>>
>> [mailto:bioconductor-bounces@
>> <mailto:bioconductor-bounces@>
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>**>______r-project.org
>> <http: r-project.org="">
>> <http: r-project.org="">
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>**____r-project.org
>> <http: r-project.org="">
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>> >>>
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>**>______r-project.org
>> <http: r-project.org="">
>> <http: r-project.org="">
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>**____r-project.org
>> <http: r-project.org="">
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">>>>>]
>> On Behalf Of Sean
>> Davis
>> .Sent: Friday, January 04,
2013 3:37
>> PM
>> .To: Michael Lawrence
>> .Cc: Tim Triche, Jr.; Vedran
Franke;
>> bioconductor@r-project.org
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>
>> <mailto:bioconductor@r-____**project.org<bioco nductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>>
>>
>> <mailto:bioconductor@r-______**project.org<bioconductor@r-_ _____project.org="">
>> <mailto:bioconductor@r-____**project.org<bioconductor@r-___ _project.org="">
>> >
>>
>> <mailto:bioconductor@r-____**project.org<bioco nductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >>
>>
>>
>> <mailto:bioconductor@r-____**project. org<bioconductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>>>
>>
>> .Subject: Re: [BioC]
countMatches()
>> (was:
>> table for
>> GenomicRanges)
>> .
>> .On Fri, Jan 4, 2013 at 4:32
PM,
>> Michael
>> Lawrence
>> .<lawrence.michael@gene.com>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>
>> <mailto:lawrence.michael@gene.**__com>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>>
>> <mailto:lawrence.michael@gene.>> <mailto:lawrence.michael@gene.**>____com
>> <mailto:lawrence.michael@gene.**__com>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">
>> >>>
>> <mailto:lawrence.michael@gene>> <mailto:lawrence.michael@gene>**.
>> <mailto:lawrence.michael@gene>> <mailto:lawrence.michael@gene>**.__>____com
>>
>> <mailto:lawrence.michael@gene.>> <mailto:lawrence.michael@gene.**>____com
>> <mailto:lawrence.michael@gene.**__com>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>>>>>
>> wrote:
>> .> The change to the behavior
of
>> %in% is a
>> pretty big
>> one. Are you
>> thinking
>> .> that all set-based
operations
>> should
>> behave this way? For
>> example, setdiff
>> .> and intersect? I really
liked
>> the syntax
>> of "peaks
>> %in% genes".
>> In my
>> .> experience, it's way more
common
>> to ask
>> questions
>> about overlap
>> than about
>> .> equality, so I'd rather
optimize
>> the API
>> for that use
>> case. But
>> again,
>> .> that's just my personal
bias.
>> .
>> .For what it is worth, I share
>> Michael's
>> personal bias here.
>> .
>> .Sean
>> .
>> .
>> .> Michael
>> .>
>> .>
>> .> On Fri, Jan 4, 2013 at 1:11
PM,
>> Hervé Pagès
>> <hpages@fhcrc.org>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
>> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>>>>> wrote:
>> .>
>> .>> Hi,
>> .>>
>> .>> I added findMatches() and
>> countMatches()
>> to the
>> latest IRanges /
>> .>> GenomicRanges packages (in
BioC
>> devel only).
>> .>>
>> .>> findMatches(x, table):
An
>> enhanced
>> version of
>> match that
>> .>> returns all the
>> matches in a
>> Hits object.
>> .>>
>> .>> countMatches(x, table):
>> Returns an
>> integer vector
>> of the length
>> .>> of x,
containing
>> the number
>> of matches in
>> table for
>> .>> each element in
x.
>> .>>
>>
>> .>> countMatches() is what you
can
>> use to
>> tally/count/tabulate
>> (choose your
>>
>> .>> preferred term) the unique
>> elements in a
>> GRanges object:
>> .>>
>> .>> library(GenomicRanges)
>> .>> set.seed(33)
>> .>> gr <- GRanges("chr1",
>> IRanges(sample(15,20,replace=***
>> ______*TRUE),
>>
>>
>>
>> width=5))
>> .>>
>> .>> Then:
>> .>>
>> .>> > gr_levels <-
sort(unique(gr))
>> .>> >
countMatches(gr_levels, gr)
>> .>> [1] 1 1 1 2 4 2 2 1 2 2
2
>> .>>
>> .>> Note that findMatches()
and
>> countMatches() also work on
>> IRanges and
>> .>> DNAStringSet objects, as
well as
>> on
>> ordinary atomic
>> vectors:
>> .>>
>> .>> library(hgu95av2probe)
>> .>> library(Biostrings)
>> .>> probes <-
>> DNAStringSet(hgu95av2probe)
>> .>> unique_probes <-
unique(probes)
>> .>> count <-
>> countMatches(unique_probes,
>> probes)
>> .>> max(count) # 7
>> .>>
>> .>> I made other changes in
>> IRanges/GenomicRanges so that
>> the notion
>> .>> of "match" between
elements of a
>> vector-like object now
>> consistently
>> .>> means "equality" instead
of
>> "overlap",
>> even for
>> range-based
>> objects
>> .>> like IRanges or GRanges
>> objects. This
>> notion of
>> "equality" is the
>> .>> same that is used by ==.
The most
>> visible consequence
>> of those
>> .>> changes is that using %in%
>> between 2
>> IRanges or
>> GRanges objects
>> .>> 'query' and 'subject' in
order
>> to do
>> overlaps was
>> replaced by
>> .>> overlapsAny(query,
subject).
>> .>>
>> .>> overlapsAny(query,
subject):
>> Finds the
>> ranges in
>> query that
>> .>> overlap any of the
ranges
>> in subject.
>> .>>
>>
>> .>> There are warnings and
>> deprecation
>> messages in place
>> to help
>> smooth
>>
>> .>> the transition.
>> .>>
>> .>> Cheers,
>> .>> H.
>> .>>
>> .>> --
>> .>> Hervé Pagès
>> .>>
>> .>> Program in Computational
Biology
>> .>> Division of Public Health
>> Sciences
>> .>> Fred Hutchinson Cancer
Research
>> Center
>> .>> 1100 Fairview Ave. N,
M1-B514
>> .>> P.O. Box 19024
>> .>> Seattle, WA 98109-1024
>> .>>
>> .>> E-mail: hpages@fhcrc.org
>> <mailto:hpages@fhcrc.org>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
>> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>>
>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>>
>>
>> .>> Phone: (206) 667-5791
>> <tel:%28206%29%20667-5791>
>> <tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
>> <tel:%28206%29%20667-5791>
>> .>> Fax: (206) 667-1319
>> <tel:%28206%29%20667-1319>
>> <tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
>> <tel:%28206%29%20667-1319>
>>
>> .>>
>> .>
>> .> [[alternative HTML
>> version deleted]]
>> .>
>> .>
>> .>
>> ______________________________**
>> _______________________
>>
>>
>>
>> .> Bioconductor mailing list
>> .> Bioconductor@r-project.org
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>
>> <mailto:bioconductor@r-____**project. org<bioconductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>>
>> <mailto:bioconductor@r-______**projec t.org<bioconductor@r-______project.org="">
>> <mailto:bioconductor@r-____**project.org<bioconductor@r-___ _project.org="">
>> >
>>
>> <mailto:bioconductor@r-____**project.org<bioco nductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >>
>>
>> <mailto:bioconductor@r-____**project. org<bioconductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>>>
>>
>> .>
>> https://stat.ethz.ch/mailman/_**_____listinfo/bioconductor<
https://stat.ethz.ch/mailman/______listinfo/bioconductor>
>> <https: stat.ethz.ch="" mailman="" **____listinfo="" bioconductor<h="" ttps:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>> >
>>
>>
>> <https: stat.ethz.ch="" mailman="" **____listinfo="" bioconductor<h="" ttps:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<htt="" ps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> >>
>>
>>
>>
>>
>> <https: stat.ethz.ch="" mailman="" **____listinfo="" bioconductor<h="" ttps:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<htt="" ps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> >
>>
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<htt="" ps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https="" :="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>> >>>
>> .> Search the archives:
>> http://news.gmane.org/gmane.__**____science.biology.**
>> informatics.______conductor<http: news.gmane.org="" gmane.______scien="" ce.biology.informatics.______conductor="">
>> <http: news.gmane.org="" gmane._**___science.biology.**="">> informatics.____conductor<http: news.gmane.org="" gmane.____science.b="" iology.informatics.____conductor="">
>> >
>>
>>
>> <http: news.gmane.org="" gmane._**___science.biology.**="">> informatics.____conductor<http: news.gmane.org="" gmane.____science.b="" iology.informatics.____conductor="">
>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">> _conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor="">
>> >>
>>
>>
>>
>> <http: news.gmane.org="" gmane._**___science.biology.**="">> informatics.____conductor<http: news.gmane.org="" gmane.____science.b="" iology.informatics.____conductor="">
>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">> _conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor="">
>> >
>>
>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">> _conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor="">
>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor="">
>> >>>
>> .
>>
>>
>> ._____________________________**________________________
>>
>>
>>
>> .Bioconductor mailing list
>> .Bioconductor@r-project.org
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>
>> <mailto:bioconductor@r-____**project. org<bioconductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>>
>> <mailto:bioconductor@r-______**projec t.org<bioconductor@r-______project.org="">
>> <mailto:bioconductor@r-____**project.org<bioconductor@r-___ _project.org="">
>> >
>> <mailto:bioconductor@r-____**project.org<bioco nductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >>
>>
>> <mailto:bioconductor@r-____**project. org<bioconductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>>>
>>
>>
>>
>> .https://stat.ethz.ch/mailman/**______listinfo/bioconductor
<https: stat.ethz.ch="" mailman="" ______listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **____listinfo="" bioconductor<h="" ttps:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>> >
>>
>> <https: stat.ethz.ch="" mailman="" **____listinfo="" bioconductor<h="" ttps:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<htt="" ps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> >>
>>
>>
>>
>>
>> <https: stat.ethz.ch="" mailman="" **____listinfo="" bioconductor<h="" ttps:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<htt="" ps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> >
>>
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<htt="" ps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https="" :="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>> >>>
>> .Search the archives:
>> http://news.gmane.org/gmane.__**____science.biology.**
>> informatics.______conductor<http: news.gmane.org="" gmane.______scien="" ce.biology.informatics.______conductor="">
>> <http: news.gmane.org="" gmane._**___science.biology.**="">> informatics.____conductor<http: news.gmane.org="" gmane.____science.b="" iology.informatics.____conductor="">
>> >
>>
>> <http: news.gmane.org="" gmane._**___science.biology.**="">> informatics.____conductor<http: news.gmane.org="" gmane.____science.b="" iology.informatics.____conductor="">
>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">> _conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor="">
>> >>
>>
>>
>>
>>
>> <http: news.gmane.org="" gmane._**___science.biology.**="">> informatics.____conductor<http: news.gmane.org="" gmane.____science.b="" iology.informatics.____conductor="">
>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">> _conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor="">
>> >
>>
>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">> _conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor="">
>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor="">
>> >>>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages@fhcrc.org
>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
>>
>>
>> Phone: (206) 667-5791
>> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
>> <tel:%28206%29%20667-5791>
>> Fax: (206) 667-1319
<tel:%28206%29%20667-1319>
>> <tel:%28206%29%20667-1319>
>> <tel:%28206%29%20667-1319>
>>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>> <tel:%28206%29%20667-5791>
>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>> <tel:%28206%29%20667-1319>
>>
>>
>>
>>
>>
>> --
>> /A model is a lie that helps you see the truth./
>> /
>> /
>> Howard Skipper
>>
<http: cancerres.__aacrjourna**ls.org="" content="" 31="" 9="" __1173.**="">> full.pdf <http: aacrjournals.org="" content="" 31="" 9="" __1173.full.pdf=""> <
>> http://cancerres.**aacrjournals.org/content/31/9/**1173.full.pdf<ht tp:="" cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
>> >>
>>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>>
> ...
>
> [Message clipped]
[[alternative HTML version deleted]]
Michael: your suggestion is both clearer and more concise than mine
was.
+1
(I prefer x %i% y %i% z rather than intersect(x, intersect(y, z)) for
the
same reason)
On Tue, Jan 8, 2013 at 2:03 PM, Michael Lawrence
<lawrence.michael@gene.com>wrote:
> I would vote for %over% instead of %ov%. Just 2 more characters but
way
> clearer, at least to me. The hardest thing to type are the %'s.
>
> Michael
>
>
>
> On Tue, Jan 8, 2013 at 11:09 AM, Hervé Pagès <hpages@fhcrc.org>
wrote:
>
>> Thanks Tim, Malcolm for the feedback.
>>
>> @Tim, I won't comment on the variants of %ov% you are proposing for
>> doing "within" or "equal" instead of "any" (but if people want
them,
>> I'll add them too). For now I just want to focus on restoring the
>> convenience of the old %in%, whose removal is understandably
causing
>> some frustration. And so we can move on.
>>
>> Cheers,
>> H.
>>
>>
>>
>> On 01/08/2013 09:50 AM, Tim Triche, Jr. wrote:
>>
>>> hell, I'll add the operators if there's support for them.
obviously
>>> they're not a big deal and a patch would take 5 minutes flat.
>>>
>>> my hope was to be very explicit about what each type of operation
meant,
>>> so that when a newcomer to the Ranges API sees
>>>
>>> peaks %overlapping% promoters(**someGroupOfGenesWeCareAbout)
>>>
>>> it cannot be confused with
>>>
>>> peaks %within% rangesThatCorrespondToSomeChro**matinState
>>>
>>> or
>>>
>>> peaks %equal% aBunchOfDNAseFootprints
>>>
>>> or
>>>
>>> DMRs %in% genes ## what the hell does this really mean,
anyways?
>>> it's so bad on so many levels
>>>
>>> because whenever someone says "what is the advantage of Ranges-
based
>>> analyses?", these are the archetypal sorts of queries that come to
mind.
>>> Except that usually in my examples they are based on posterior
>>> probabilities, but perhaps that could stand to change.
>>>
>>> Anyways, that's just my bias, and you're doing the heavy lifting.
But
>>> if people agree with the motivations I will write the patch today.
>>>
>>> Cheers,
>>>
>>> --t
>>>
>>>
>>>
>>>
>>> On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès <hpages@fhcrc.org>>> <mailto:hpages@fhcrc.org>> wrote:
>>>
>>> Hi Tim,
>>>
>>> I could add the %ov% operator as a replacement for the old
%in%. So
>>> you
>>> would write 'peaks %ov% genes' instead of 'peaks %in% genes'.
Would
>>> just
>>> be a convenience wrapper for 'overlapsAny(peaks, genes)'.
>>>
>>> Cheers,
>>> H.
>>>
>>>
>>> On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
>>>
>>> So why not leave %in% as it was and transition everything
>>> forward to
>>> explicitly using { `%within%`,
`%overlaps%`|`%overlapping%`,
>>> `%equals%`
>>> } such that
>>>
>>> identical( x %within% table, countOverlaps(x, table,
>>> type='within') >
>>> 0 ) == TRUE
>>> identical( x %overlaps% table, countOverlaps(x, table,
>>> type='any') >
>>> 0 ) == TRUE
>>> identical( x %equals% table, countOverlaps(x, table,
>>> type='equal') >
>>> 0 ) == TRUE
>>>
>>> and for the time being,
>>>
>>> identical( x %overlaps% table, countOverlaps(x, table,
>>> type='any') >
>>> 0 ) == TRUE ## but with a noisy nastygram that will halt
if
>>> options("warn"=2)
>>> No breakage for %in% methods until such time as a full
>>> deprecation cycle
>>> has passed, and if the maintainers can't be arsed to do
anything
>>> at all
>>> about the warnings by the second full release, then
perhaps they
>>> don't
>>> really care that much after all. Just a thought?
>>>
>>> From someone (me) who has their own issues with keeping
>>> everything up
>>> to date and should know better. If you want to use %in%
for
>>>
>>> peaks %in% genes (why on earth would you do this
rather than
>>> peaks
>>> %in% promoters(genes), anyways?)
>>>
>>> then a nastygram could be emitted "WARNING: YOUR SHORTHAND
>>> NOTATION IS
>>> DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and
everyone is
>>> (more
>>> or less) happy.
>>>
>>>
>>>
>>> On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
>>> <lawrence.michael@gene.com <mailto:lawrence.michael@gene.**com<lawrence.michael@gene.com="">
>>> >
>>> <mailto:lawrence.michael@gene.**__com>>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>>>
>>> wrote:
>>>
>>>
>>>
>>>
>>> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès
>>> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
wrote:
>>>
>>> Hi Michael,
>>>
>>> I don't think "match" (the word) always has to
mean
>>> "equality"
>>> either.
>>> However having match() (the function) do "whole
exact
>>> matching" (aka
>>> "equality") for any kind of vector-like object
has the
>>> advantage of:
>>>
>>> (a) making it consistent with base::match()
>>> (?base::match is
>>> pretty
>>> explicit about what the contract of
match() is)
>>>
>>>
>>> (a) alone is obviously not enough. We have many
methods,
>>> like the
>>> set operations, that treat ranges specially. Are we
going
>>> to start
>>> moving everything toward the base behavior? And have
>>> rangeIntersect,
>>> rangeSetdiff, etc?
>>>
>>> (b) preserving its relationship with ==,
>>> duplicated(), unique(),
>>> etc...
>>>
>>>
>>> So it becomes consistent with duplicated/unique, but
we lose
>>> consistency with the set operations.
>>>
>>> (c) not frustrating the user who needs
something to
>>> do exact
>>> matching on ranges (as I mentioned
previously,
>>> if you take
>>> match() away from him/her, s/he'll be left
with
>>> nothing).
>>>
>>>
>>> No one has ever asked for match() to behave this way.
There
>>> was a
>>> request for a way to tabulate identical ranges. It
was a
>>> nice idea
>>> to extract the general "outer equal" findMatches
function.
>>> But the
>>> changes seem to be snow-balling. These types of
changes
>>> mean a lot
>>> of maintenance work for the users. A deprecation
cycle does
>>> not
>>> circumvent that.
>>>
>>>
>>> IMO those advantages counterbalance *by far* the
very
>>> little
>>> convenience you get from having 'match(query,
subject)'
>>> do
>>> 'findOverlaps(query, subject, select="first")' on
>>> IRanges/GRanges objects. If you need to do that,
just
>>> use the
>>> latter, or, if you think that's still too much
typing,
>>> define
>>> a wrapper e.g. 'ovmatch(query, subject)'.
>>>
>>> There are plenty of specialized tools around for
doing
>>> inexact/fuzzy/partial/overlap matching for many
>>> particular types
>>> of vector-like objects: grep() and family,
pmatch(),
>>> charmatch(),
>>> agrep(), grepRaw(), matchPattern() and family,
>>> findOverlaps() and
>>> family, findIntervals(), etc... For the reasons I
>>> mentioned
>>> above, none of them should hijack match() to make
it do
>>> some
>>> particular type of inexact matching on some
particular
>>> type of
>>> objects. Even if, for that particular type of
objects,
>>> doing that
>>> particular type of inexact matching is more
common than
>>> doing
>>> exact matching.
>>>
>>> H.
>>>
>>>
>>>
>>> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>>>
>>> I think having overlapsAny is a nice addition
and
>>> helps make
>>> the API
>>> more complete and explicit. Are you sure we
need to
>>> change
>>> the behavior
>>> of the match method for this relatively
uncommon
>>> use case?
>>>
>>>
>>> Yes because otherwise users with a use case of
doing
>>> match()
>>>
>>> even if it's uncommon,
>>>
>>>
>>> I don't think
>>> "match" always has to mean "equality". It is
a more
>>> general
>>> concept in
>>> my mind. The most common use case for
matching
>>> ranges is
>>> overlap.
>>>
>>>
>>> Of course "match" doesn't always have to mean
equality.
>>> But of base
>>>
>>>
>>> Michael
>>>
>>>
>>> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
>>> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>>
wrote:
>>>
>>> Yes 'peaks %in% genes' is cute and was
>>> probably doing
>>> the right thing
>>> for most users (although not all). But
'exons
>>> %in%
>>> genes' is cute too
>>> and was probably doing the wrong thing
for
>>> all users.
>>> Advanced users
>>> like you guys would have no problem
switching
>>> to
>>>
>>> !is.na <http: is.na=""> <http: is.na="">
>>> <http: is.na="">(findOverlaps(__**__peaks,
genes,
>>>
>>> type="within",
>>>
>>> select="any"))
>>>
>>> or
>>>
>>> !is.na <http: is.na=""> <http: is.na="">
>>> <http: is.na="">(findOverlaps(__**__peaks,
genes,
>>>
>>> type="equal",
>>>
>>>
>>> select="any"))
>>>
>>> in case 'peaks %in% genes' was not doing
>>> exactly what
>>> you wanted,
>>> but most users would not find this
particularly
>>> friendly. Even
>>> worse, some users probably didn't
realize that
>>> 'peaks
>>> %in% genes'
>>> was not doing exactly what they thought
it did
>>> because
>>> "peaks in
>>> genes" in English suggests that the
peaks are
>>> within
>>> the genes,
>>> but it's not what 'peaks %in% genes'
does.
>>>
>>> Having overlapsAny(), with exactly the
same
>>> extra
>>> arguments as
>>> countOverlaps() and subsetByOverlaps()
(i.e.
>>> 'maxgap',
>>> 'minoverlap',
>>> 'type', 'ignore.strand'), all of them
>>> documented (and
>>> with most
>>> users more or less familiar with them
already)
>>> has the
>>> virtue to
>>> expose the user to all the options from
the
>>> very start,
>>> and to
>>> help him/her make the right choice. Of
course
>>> there
>>> will be users
>>> that don't want or don't have the time
to
>>> read/think
>>> about all the
>>> options. Not a big deal: they'll just do
>>> 'overlapsAny(query, subject)',
>>> which is not a lot more typing than
'query %in%
>>> subject', especially
>>> if they use tab completion.
>>>
>>> It's true that it's more common to ask
>>> questions about
>>> overlap than
>>> about equality but there are some use
cases
>>> for the
>>> latter (as the
>>> original thread shows). Until now, when
you
>>> had such a
>>> use case, you
>>> could not use match() or %in%, which
would
>>> have been
>>> the natural things
>>> to use, because they got hijacked to do
>>> something else,
>>> and you were
>>> left with nothing. Not a satisfying
situation.
>>> So at a
>>> minimum, we
>>> needed to restore the true/real/original
>>> semantic of
>>> match() to do
>>> "equality" instead of "overlap". But
it's hard
>>> to do
>>> this for match()
>>> and not do it for %in% too. For more
than 99%
>>> of R
>>> users, %in% is
>>> just a simple wrapper for 'match(x,
table,
>>> nomatch = 0)
>>> > 0' (this
>>> is how it has been documented and
implemented
>>> in base R
>>> for many
>>> years). Not maintaining this
relationship
>>> between %in%
>>> and match()
>>> would only cause grief and frustration
to
>>> newcomers to
>>> Bioconductor.
>>>
>>> H.
>>>
>>>
>>>
>>> On 01/04/2013 03:32 PM, Cook, Malcolm
wrote:
>>>
>>> Hiya again,
>>>
>>> I am definitely a late comer to
BioC, so I
>>> definitely easily
>>> defer to
>>> the tide of history.
>>>
>>> But I do think you miss my point
Michael
>>> about the
>>> proposed change
>>> making the relationship between %in%
and
>>> match for
>>> {G,I}Ranges{List}
>>> mimic that between other vectors,
and I do
>>> think
>>> that changing
>>> the API
>>> would make other late-comers take to
BioC
>>> easier/faster.
>>>
>>> That said, I NEVER use %in% so I
really
>>> have no
>>> stake in the
>>> matter, and
>>> I DEFINITELY appreciate the argument
to not
>>> changing the API
>>> just for
>>> sematic sweetness.
>>>
>>> That that said, Herve is _/so good/_
about
>>> deprecations and warnings
>>>
>>> that make such changes fairly easily
>>> digestible.
>>>
>>> That that that.... enough.... I bow
out of
>>> this
>>> one....!!!!
>>>
>>> Always learning and Happy New Year
to all
>>> lurkers,
>>>
>>> ~Malcolm
>>>
>>> *From:*Michael Lawrence
>>> [mailto:lawrence.michael@gene
>>> <mailto:lawrence.michael@gene>**.
>>> <mailto:lawrence.michael@gene>>> <mailto:lawrence.michael@gene>**.__>____com
>>>
>>>
>>> <mailto:lawrence.michael@gene.>>> <mailto:lawrence.michael@gene.**>____com
>>> <mailto:lawrence.michael@gene.**__com>>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">
>>> >>>]
>>> *Sent:* Friday, January 04, 2013
5:11 PM
>>> *To:* Cook, Malcolm
>>> *Cc:* Sean Davis; Michael Lawrence;
Hervé
>>> Pagès
>>> (hpages@fhcrc.org
>>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>>> <mailto:hpages@fhcrc.org>>
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>>
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>); Tim
>>>
>>>
>>>
>>> Triche, Jr.; Vedran Franke;
>>> bioconductor@r-project.org
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>
>>> <mailto:bioconductor@r-____**project .org<bioconductor@r-____project.org="">
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>> >
>>>
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>>
>>> *Subject:* Re: [BioC] countMatches()
(was:
>>> table
>>> for GenomicRanges)
>>>
>>>
>>> On Fri, Jan 4, 2013 at 1:56 PM,
Cook,
>>> Malcolm
>>> <mec@stowers.org <mailto:mec@stowers.org="">
>>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>
>>> <mailto:mec@stowers.org>>> <mailto:mec@stowers.org> <mailto:mec@stowers.org>>> <mailto:mec@stowers.org>>>
>>> <mailto:mec@stowers.org>>> <mailto:mec@stowers.org> <mailto:mec@stowers.org>>> <mailto:mec@stowers.org>>
>>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">
>>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>>>>
wrote:
>>>
>>> Hiya,
>>>
>>> For what it is worth...
>>>
>>> I think the change to %in% is
warranted.
>>>
>>> If I understand correctly, this
change
>>> restores the
>>> relationship
>>> between
>>> the semantics of `%in` and the
semantics
>>> of `match`.
>>>
>>> From the docs:
>>>
>>> '"%in%" <- function(x, table)
match(x,
>>> table,
>>> nomatch = 0) > 0'
>>>
>>> Herve's change restores this
relationship.
>>>
>>>
>>> match and %in% were initially
consistent
>>> (both
>>> considering any
>>> overlap);
>>> Herve has changed both of them
together.
>>> The whole
>>> idea behind
>>> IRanges
>>> is that ranges are special data
types with
>>> special
>>> semantics. We
>>> have
>>> reimplemented much of the existing R
>>> vector API
>>> using those
>>> semantics;
>>> this extends beyond match/%in%. I am
>>> hesitant about
>>> making such
>>> sweeping
>>> changes to the API so late in the
>>> life-cycle of the
>>> package.
>>> There was a
>>> feature request for a way to count
>>> identical ranges
>>> in a set of
>>> ranges.
>>> Let's please not get carried away
and start
>>> redesigning the API
>>> for this
>>> one, albeit useful, request. There
are all
>>> sorts of
>>> inconsistencies in
>>> the API, and many of them were
conscious
>>> decisions
>>> that considered
>>> practical use cases.
>>>
>>> Michael
>>>
>>>
>>> Herve, I suspect you were you
as a
>>> result able to
>>> completely drop
>>> all the
`%in%,BiocClass1,BiocClass2`
>>> definitions and depend
>>> upon
>>> base::%in%
>>>
>>> Am I right?
>>>
>>> If so, may I suggest that Herve
stay
>>> the
>>> course, with the
>>> addition of
>>> '"%ol%" <- function(a, b)
>>> findOverlaps(a,
>>> b, maxgap=0L,
>>> minoverlap=1L, type='any',
>>> select='all') > 0'
>>>
>>> This would provide a
perspicacious
>>> idiom, thereby
>>> optimizing the API
>>> for Michaels observed common
use case.
>>>
>>> Just sayin'
>>>
>>> ~Malcolm
>>>
>>>
>>> .-----Original Message-----
>>> .From:
>>> bioconductor-bounces@r-______**project.org<bioconductor- bounces@r-______project.org="">
>>> <mailto:bioconductor-bounces@**r-____project.org <bioconductor-bounces@r-____project.org="">
>>> >
>>> <mailto:bioconductor-bounces@_**_r- __project.org="">>> <mailto:bioconductor-bounces@**r-__project.org <bioconductor-bounces@r-__project.org="">
>>> >>
>>> <mailto:bioconductor-bounces@>>> <mailto:bioconductor-bounces@>**____r-project.org
>>> <http: r-project.org="">
>>> <mailto:bioconductor-bounces@_**_r- project.org="">>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>>> >>>
>>>
>>> <mailto:bioconductor-bounces@>>> <mailto:bioconductor-bounces@>
>>> <mailto:bioconductor-bounces@>>> <mailto:bioconductor-bounces@>**>______r-project.org
>>> <http: r-project.org="">
>>> <http: r-project.org="">
>>> <mailto:bioconductor-bounces@>>> <mailto:bioconductor-bounces@>**____r-project.org
>>> <http: r-project.org="">
>>> <mailto:bioconductor-bounces@_**_r- project.org="">>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>>> >>>>
>>> [mailto:bioconductor-bounces@
>>> <mailto:bioconductor-bounces@>
>>>
>>> <mailto:bioconductor-bounces@>>> <mailto:bioconductor-bounces@>**>______r-project.org
>>> <http: r-project.org="">
>>> <http: r-project.org="">
>>> <mailto:bioconductor-bounces@>>> <mailto:bioconductor-bounces@>**____r-project.org
>>> <http: r-project.org="">
>>> <mailto:bioconductor-bounces@_**_r- project.org="">>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>>> >>>
>>>
>>> <mailto:bioconductor-bounces@>>> <mailto:bioconductor-bounces@>
>>> <mailto:bioconductor-bounces@>>> <mailto:bioconductor-bounces@>**>______r-project.org
>>> <http: r-project.org="">
>>> <http: r-project.org="">
>>>
>>> <mailto:bioconductor-bounces@>>> <mailto:bioconductor-bounces@>**____r-project.org
>>> <http: r-project.org="">
>>> <mailto:bioconductor-bounces@_**_r- project.org="">>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">>>>>]
>>> On Behalf Of Sean
>>> Davis
>>> .Sent: Friday, January 04,
2013
>>> 3:37 PM
>>> .To: Michael Lawrence
>>> .Cc: Tim Triche, Jr.; Vedran
Franke;
>>> bioconductor@r-project.org
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>
>>> <mailto:bioconductor@r-____**project.org<bioc onductor@r-____project.org="">
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>>
>>>
>>> <mailto:bioconductor@r-______**project.org<bioconductor@r- ______project.org="">
>>> <mailto:bioconductor@r-____**project.org<bioconductor@r-__ __project.org="">
>>> >
>>>
>>> <mailto:bioconductor@r-____**project.org<bioc onductor@r-____project.org="">
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>> >>
>>>
>>>
>>> <mailto:bioconductor@r-____**project .org<bioconductor@r-____project.org="">
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>>>
>>>
>>> .Subject: Re: [BioC]
countMatches()
>>> (was:
>>> table for
>>> GenomicRanges)
>>> .
>>> .On Fri, Jan 4, 2013 at 4:32
PM,
>>> Michael
>>> Lawrence
>>> .<lawrence.michael@gene.com>>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>
>>> <mailto:lawrence.michael@gene.**__com>>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">
>>> >>
>>> <mailto:lawrence.michael@gene.>>> <mailto:lawrence.michael@gene.**>____com
>>> <mailto:lawrence.michael@gene.**__com>>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">
>>> >>>
>>> <mailto:lawrence.michael@gene>>> <mailto:lawrence.michael@gene>**.
>>> <mailto:lawrence.michael@gene>>> <mailto:lawrence.michael@gene>**.__>____com
>>>
>>> <mailto:lawrence.michael@gene.>>> <mailto:lawrence.michael@gene.**>____com
>>> <mailto:lawrence.michael@gene.**__com>>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>>>>>
>>> wrote:
>>> .> The change to the behavior
of
>>> %in% is a
>>> pretty big
>>> one. Are you
>>> thinking
>>> .> that all set-based
operations
>>> should
>>> behave this way? For
>>> example, setdiff
>>> .> and intersect? I really
liked
>>> the syntax
>>> of "peaks
>>> %in% genes".
>>> In my
>>> .> experience, it's way more
common
>>> to ask
>>> questions
>>> about overlap
>>> than about
>>> .> equality, so I'd rather
optimize
>>> the API
>>> for that use
>>> case. But
>>> again,
>>> .> that's just my personal
bias.
>>> .
>>> .For what it is worth, I
share
>>> Michael's
>>> personal bias here.
>>> .
>>> .Sean
>>> .
>>> .
>>> .> Michael
>>> .>
>>> .>
>>> .> On Fri, Jan 4, 2013 at
1:11 PM,
>>> Hervé Pagès
>>> <hpages@fhcrc.org>>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>>> <mailto:hpages@fhcrc.org>>
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
>>> <mailto:hpages@fhcrc.org>>> <mailto:hpages@fhcrc.org>
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>> <mailto:hpages@fhcrc.org>>> <mailto:hpages@fhcrc.org>>>>> wrote:
>>> .>
>>> .>> Hi,
>>> .>>
>>> .>> I added findMatches() and
>>> countMatches()
>>> to the
>>> latest IRanges /
>>> .>> GenomicRanges packages
(in BioC
>>> devel only).
>>> .>>
>>> .>> findMatches(x, table):
An
>>> enhanced
>>> version of
>>> match that
>>> .>> returns all the
>>> matches in a
>>> Hits object.
>>> .>>
>>> .>> countMatches(x, table):
>>> Returns an
>>> integer vector
>>> of the length
>>> .>> of x,
containing
>>> the number
>>> of matches in
>>> table for
>>> .>> each element in
x.
>>> .>>
>>>
>>> .>> countMatches() is what
you can
>>> use to
>>> tally/count/tabulate
>>> (choose your
>>>
>>> .>> preferred term) the
unique
>>> elements in a
>>> GRanges object:
>>> .>>
>>> .>> library(GenomicRanges)
>>> .>> set.seed(33)
>>> .>> gr <- GRanges("chr1",
>>> IRanges(sample(15,20,replace=***
>>> ______*TRUE),
>>>
>>>
>>>
>>> width=5))
>>> .>>
>>> .>> Then:
>>> .>>
>>> .>> > gr_levels <-
>>> sort(unique(gr))
>>> .>> >
countMatches(gr_levels, gr)
>>> .>> [1] 1 1 1 2 4 2 2 1 2
2 2
>>> .>>
>>> .>> Note that findMatches()
and
>>> countMatches() also work on
>>> IRanges and
>>> .>> DNAStringSet objects, as
well
>>> as on
>>> ordinary atomic
>>> vectors:
>>> .>>
>>> .>> library(hgu95av2probe)
>>> .>> library(Biostrings)
>>> .>> probes <-
>>> DNAStringSet(hgu95av2probe)
>>> .>> unique_probes <-
>>> unique(probes)
>>> .>> count <-
>>> countMatches(unique_probes,
>>> probes)
>>> .>> max(count) # 7
>>> .>>
>>> .>> I made other changes in
>>> IRanges/GenomicRanges so that
>>> the notion
>>> .>> of "match" between
elements of a
>>> vector-like object now
>>> consistently
>>> .>> means "equality" instead
of
>>> "overlap",
>>> even for
>>> range-based
>>> objects
>>> .>> like IRanges or GRanges
>>> objects. This
>>> notion of
>>> "equality" is the
>>> .>> same that is used by ==.
The
>>> most
>>> visible consequence
>>> of those
>>> .>> changes is that using
%in%
>>> between 2
>>> IRanges or
>>> GRanges objects
>>> .>> 'query' and 'subject' in
order
>>> to do
>>> overlaps was
>>> replaced by
>>> .>> overlapsAny(query,
subject).
>>> .>>
>>> .>> overlapsAny(query,
subject):
>>> Finds the
>>> ranges in
>>> query that
>>> .>> overlap any of the
ranges
>>> in subject.
>>> .>>
>>>
>>> .>> There are warnings and
>>> deprecation
>>> messages in place
>>> to help
>>> smooth
>>>
>>> .>> the transition.
>>> .>>
>>> .>> Cheers,
>>> .>> H.
>>> .>>
>>> .>> --
>>> .>> Hervé Pagès
>>> .>>
>>> .>> Program in Computational
Biology
>>> .>> Division of Public Health
>>> Sciences
>>> .>> Fred Hutchinson Cancer
Research
>>> Center
>>> .>> 1100 Fairview Ave. N,
M1-B514
>>> .>> P.O. Box 19024
>>> .>> Seattle, WA 98109-1024
>>> .>>
>>> .>> E-mail: hpages@fhcrc.org
>>> <mailto:hpages@fhcrc.org>
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>> >>>
>>> <mailto:hpages@fhcrc.org>>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>>> <mailto:hpages@fhcrc.org>>
>>>
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>>
>>>
>>> .>> Phone: (206) 667-5791
>>> <tel:%28206%29%20667-5791>
>>> <tel:%28206%29%20667-5791>
>>> <tel:%28206%29%20667-5791>
>>> <tel:%28206%29%20667-5791>
>>> .>> Fax: (206) 667-1319
>>> <tel:%28206%29%20667-1319>
>>> <tel:%28206%29%20667-1319>
>>> <tel:%28206%29%20667-1319>
>>> <tel:%28206%29%20667-1319>
>>>
>>> .>>
>>> .>
>>> .> [[alternative HTML
>>> version deleted]]
>>> .>
>>> .>
>>> .>
>>> ______________________________**
>>> _______________________
>>>
>>>
>>>
>>> .> Bioconductor mailing list
>>> .> Bioconductor@r-project.org
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>
>>> <mailto:bioconductor@r-____**project .org<bioconductor@r-____project.org="">
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>>
>>> <mailto:bioconductor@r-______**proje ct.org<bioconductor@r-______project.org="">
>>> <mailto:bioconductor@r-____**project.org<bioconductor@r-__ __project.org="">
>>> >
>>>
>>> <mailto:bioconductor@r-____**project.org<bioc onductor@r-____project.org="">
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>> >>
>>>
>>> <mailto:bioconductor@r-____**project .org<bioconductor@r-____project.org="">
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>>>
>>>
>>> .>
>>> https://stat.ethz.ch/mailman/_**_____listinfo/bioconductor
<https: stat.ethz.ch="" mailman="" ______listinfo="" bioconductor="">
>>> <https: stat.ethz.ch="" mailman="" **____listinfo="" bioconductor<="" https:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>>> >
>>>
>>>
>>> <https: stat.ethz.ch="" mailman="" **____listinfo="" bioconductor<="" https:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<ht="" tps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>> >>
>>>
>>>
>>>
>>>
>>> <https: stat.ethz.ch="" mailman="" **____listinfo="" bioconductor<="" https:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<ht="" tps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>> >
>>>
>>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<ht="" tps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<http="" s:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>>> >>>
>>> .> Search the archives:
>>> http://news.gmane.org/gmane.__**____science.biology.**
>>> informatics.______conductor<http: news.gmane.org="" gmane.______scie="" nce.biology.informatics.______conductor="">
>>> <http: news.gmane.org="" gmane._**___science.biology.**="">>> informatics.____conductor<http: news.gmane.org="" gmane.____science.="" biology.informatics.____conductor="">
>>> >
>>>
>>>
>>> <http: news.gmane.org="" gmane._**___science.biology.**="">>> informatics.____conductor<http: news.gmane.org="" gmane.____science.="" biology.informatics.____conductor="">
>>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">>> _conductor<http: news.gmane.org="" gmane.__science.biology.informati="" cs.__conductor="">
>>> >>
>>>
>>>
>>>
>>> <http: news.gmane.org="" gmane._**___science.biology.**="">>> informatics.____conductor<http: news.gmane.org="" gmane.____science.="" biology.informatics.____conductor="">
>>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">>> _conductor<http: news.gmane.org="" gmane.__science.biology.informati="" cs.__conductor="">
>>> >
>>>
>>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">>> _conductor<http: news.gmane.org="" gmane.__science.biology.informati="" cs.__conductor="">
>>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">>> conductor<http: news.gmane.org="" gmane.science.biology.informatics.="" conductor="">
>>> >>>
>>> .
>>>
>>>
>>> ._____________________________**________________________
>>>
>>>
>>>
>>> .Bioconductor mailing list
>>> .Bioconductor@r-project.org
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>
>>> <mailto:bioconductor@r-____**project .org<bioconductor@r-____project.org="">
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>>
>>> <mailto:bioconductor@r-______**proje ct.org<bioconductor@r-______project.org="">
>>> <mailto:bioconductor@r-____**project.org<bioconductor@r-__ __project.org="">
>>> >
>>> <mailto:bioconductor@r-____**project.org<bioc onductor@r-____project.org="">
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>> >>
>>>
>>> <mailto:bioconductor@r-____**project .org<bioconductor@r-____project.org="">
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>> >
>>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>>
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>>> >>>>
>>>
>>>
>>>
>>> .https://stat.ethz.ch/mailman/**______listinfo/bioconducto
r<https: stat.ethz.ch="" mailman="" ______listinfo="" bioconductor="">
>>> <https: stat.ethz.ch="" mailman="" **____listinfo="" bioconductor<="" https:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>>> >
>>>
>>> <https: stat.ethz.ch="" mailman="" **____listinfo="" bioconductor<="" https:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<ht="" tps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>> >>
>>>
>>>
>>>
>>>
>>> <https: stat.ethz.ch="" mailman="" **____listinfo="" bioconductor<="" https:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<ht="" tps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>> >
>>>
>>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<ht="" tps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<http="" s:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>>> >>>
>>> .Search the archives:
>>> http://news.gmane.org/gmane.__**____science.biology.**
>>> informatics.______conductor<http: news.gmane.org="" gmane.______scie="" nce.biology.informatics.______conductor="">
>>> <http: news.gmane.org="" gmane._**___science.biology.**="">>> informatics.____conductor<http: news.gmane.org="" gmane.____science.="" biology.informatics.____conductor="">
>>> >
>>>
>>> <http: news.gmane.org="" gmane._**___science.biology.**="">>> informatics.____conductor<http: news.gmane.org="" gmane.____science.="" biology.informatics.____conductor="">
>>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">>> _conductor<http: news.gmane.org="" gmane.__science.biology.informati="" cs.__conductor="">
>>> >>
>>>
>>>
>>>
>>>
>>> <http: news.gmane.org="" gmane._**___science.biology.**="">>> informatics.____conductor<http: news.gmane.org="" gmane.____science.="" biology.informatics.____conductor="">
>>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">>> _conductor<http: news.gmane.org="" gmane.__science.biology.informati="" cs.__conductor="">
>>> >
>>>
>>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">>> _conductor<http: news.gmane.org="" gmane.__science.biology.informati="" cs.__conductor="">
>>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">>> conductor<http: news.gmane.org="" gmane.science.biology.informatics.="" conductor="">
>>> >>>
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages@fhcrc.org
>>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>>> <mailto:hpages@fhcrc.org>>
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
>>>
>>>
>>> Phone: (206) 667-5791
>>> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
>>> <tel:%28206%29%20667-5791>
>>> Fax: (206) 667-1319
<tel:%28206%29%20667-1319>
>>> <tel:%28206%29%20667-1319>
>>> <tel:%28206%29%20667-1319>
>>>
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages@fhcrc.org
<mailto:hpages@fhcrc.org>
>>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>> <tel:%28206%29%20667-5791>
>>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>> <tel:%28206%29%20667-1319>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> /A model is a lie that helps you see the truth./
>>> /
>>> /
>>> Howard Skipper
>>>
<http: cancerres.__aacrjourna**ls.org="" content="" 31="" 9="" __1173.**="">>> full.pdf <http: aacrjournals.org="" content="" 31="" 9="" __1173.full.pdf=""> <
>>> http://cancerres.**aacrjournals.org/content/31/9/**1173.full.pdf<h ttp:="" cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
>>> >>
>>>
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>>
>> ...
>>
>> [Message clipped]
>
>
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
If we're voting/brainstorming, I'd go for one operator for value that
the 'type' arg of overlap can take on
Thus:
%olStart%
%olEnd%
%olWithin%
%olAny% (perhaps with alias of just '%ol%')
%olEqual% (which should be same as %in%, right)
Doh, I can't stay away from this issue for some reason..... Anyway, my
2 cents
~Malcolm
From: Tim Triche, Jr. [mailto:tim.triche@gmail.com]
Sent: Tuesday, January 08, 2013 4:12 PM
To: Michael Lawrence
Cc: Hervé Pagès; Cook, Malcolm; Sean Davis; Vedran Franke;
bioconductor@r-project.org
Subject: Re: [BioC] countMatches() (was: table for GenomicRanges)
Michael: your suggestion is both clearer and more concise than mine
was. +1
(I prefer x %i% y %i% z rather than intersect(x, intersect(y, z)) for
the same reason)
On Tue, Jan 8, 2013 at 2:03 PM, Michael Lawrence
<lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>> wrote:
I would vote for %over% instead of %ov%. Just 2 more characters but
way clearer, at least to me. The hardest thing to type are the %'s.
Michael
On Tue, Jan 8, 2013 at 11:09 AM, Hervé Pagès
<hpages@fhcrc.org<mailto:hpages@fhcrc.org>> wrote:
Thanks Tim, Malcolm for the feedback.
@Tim, I won't comment on the variants of %ov% you are proposing for
doing "within" or "equal" instead of "any" (but if people want them,
I'll add them too). For now I just want to focus on restoring the
convenience of the old %in%, whose removal is understandably causing
some frustration. And so we can move on.
Cheers,
H.
On 01/08/2013 09:50 AM, Tim Triche, Jr. wrote:
hell, I'll add the operators if there's support for them. obviously
they're not a big deal and a patch would take 5 minutes flat.
my hope was to be very explicit about what each type of operation
meant,
so that when a newcomer to the Ranges API sees
peaks %overlapping% promoters(someGroupOfGenesWeCareAbout)
it cannot be confused with
peaks %within% rangesThatCorrespondToSomeChromatinState
or
peaks %equal% aBunchOfDNAseFootprints
or
DMRs %in% genes ## what the hell does this really mean, anyways?
it's so bad on so many levels
because whenever someone says "what is the advantage of Ranges-based
analyses?", these are the archetypal sorts of queries that come to
mind.
Except that usually in my examples they are based on posterior
probabilities, but perhaps that could stand to change.
Anyways, that's just my bias, and you're doing the heavy lifting. But
if people agree with the motivations I will write the patch today.
Cheers,
--t
On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès
<hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>> wrote:
Hi Tim,
I could add the %ov% operator as a replacement for the old %in%.
So you
would write 'peaks %ov% genes' instead of 'peaks %in% genes'.
Would just
be a convenience wrapper for 'overlapsAny(peaks, genes)'.
Cheers,
H.
On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
So why not leave %in% as it was and transition everything
forward to
explicitly using { `%within%`, `%overlaps%`|`%overlapping%`,
`%equals%`
} such that
identical( x %within% table, countOverlaps(x, table,
type='within') >
0 ) == TRUE
identical( x %overlaps% table, countOverlaps(x, table,
type='any') >
0 ) == TRUE
identical( x %equals% table, countOverlaps(x, table,
type='equal') >
0 ) == TRUE
and for the time being,
identical( x %overlaps% table, countOverlaps(x, table,
type='any') >
0 ) == TRUE ## but with a noisy nastygram that will halt if
options("warn"=2)
No breakage for %in% methods until such time as a full
deprecation cycle
has passed, and if the maintainers can't be arsed to do
anything
at all
about the warnings by the second full release, then perhaps
they
don't
really care that much after all. Just a thought?
From someone (me) who has their own issues with keeping
everything up
to date and should know better. If you want to use %in% for
peaks %in% genes (why on earth would you do this rather
than
peaks
%in% promoters(genes), anyways?)
then a nastygram could be emitted "WARNING: YOUR SHORTHAND
NOTATION IS
DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and everyone
is
(more
or less) happy.
On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
<lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>
<mailto:lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>>
<mailto:lawrence.michael@gene.<mailto:lawrence.michael@gene.>__com
<mailto:lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>>>>
wrote:
On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès
<hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>> wrote:
Hi Michael,
I don't think "match" (the word) always has to mean
"equality"
either.
However having match() (the function) do "whole exact
matching" (aka
"equality") for any kind of vector-like object has
the
advantage of:
(a) making it consistent with base::match()
(?base::match is
pretty
explicit about what the contract of match()
is)
(a) alone is obviously not enough. We have many methods,
like the
set operations, that treat ranges specially. Are we
going
to start
moving everything toward the base behavior? And have
rangeIntersect,
rangeSetdiff, etc?
(b) preserving its relationship with ==,
duplicated(), unique(),
etc...
So it becomes consistent with duplicated/unique, but we
lose
consistency with the set operations.
(c) not frustrating the user who needs something
to
do exact
matching on ranges (as I mentioned previously,
if you take
match() away from him/her, s/he'll be left
with
nothing).
No one has ever asked for match() to behave this way.
There
was a
request for a way to tabulate identical ranges. It was a
nice idea
to extract the general "outer equal" findMatches
function.
But the
changes seem to be snow-balling. These types of changes
mean a lot
of maintenance work for the users. A deprecation cycle
does not
circumvent that.
IMO those advantages counterbalance *by far* the very
little
convenience you get from having 'match(query,
subject)' do
'findOverlaps(query, subject, select="first")' on
IRanges/GRanges objects. If you need to do that, just
use the
latter, or, if you think that's still too much
typing,
define
a wrapper e.g. 'ovmatch(query, subject)'.
There are plenty of specialized tools around for
doing
inexact/fuzzy/partial/overlap matching for many
particular types
of vector-like objects: grep() and family, pmatch(),
charmatch(),
agrep(), grepRaw(), matchPattern() and family,
findOverlaps() and
family, findIntervals(), etc... For the reasons I
mentioned
above, none of them should hijack match() to make it
do
some
particular type of inexact matching on some
particular
type of
objects. Even if, for that particular type of
objects,
doing that
particular type of inexact matching is more common
than
doing
exact matching.
H.
On 01/06/2013 05:39 PM, Michael Lawrence wrote:
I think having overlapsAny is a nice addition and
helps make
the API
more complete and explicit. Are you sure we need
to
change
the behavior
of the match method for this relatively uncommon
use case?
Yes because otherwise users with a use case of doing
match()
even if it's uncommon,
I don't think
"match" always has to mean "equality". It is a
more
general
concept in
my mind. The most common use case for matching
ranges is
overlap.
Of course "match" doesn't always have to mean
equality.
But of base
Michael
On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
<hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>>> wrote:
Yes 'peaks %in% genes' is cute and was
probably doing
the right thing
for most users (although not all). But
'exons %in%
genes' is cute too
and was probably doing the wrong thing for
all users.
Advanced users
like you guys would have no problem
switching to
!is.na<http: is.na=""> <http: is.na="">
<http: is.na="">
<http: is.na="">(findOverlaps(____peaks, genes,
type="within",
select="any"))
or
!is.na<http: is.na=""> <http: is.na="">
<http: is.na="">
<http: is.na="">(findOverlaps(____peaks, genes,
type="equal",
select="any"))
in case 'peaks %in% genes' was not doing
exactly what
you wanted,
but most users would not find this
particularly
friendly. Even
worse, some users probably didn't realize
that
'peaks
%in% genes'
was not doing exactly what they thought it
did
because
"peaks in
genes" in English suggests that the peaks
are
within
the genes,
but it's not what 'peaks %in% genes' does.
Having overlapsAny(), with exactly the same
extra
arguments as
countOverlaps() and subsetByOverlaps() (i.e.
'maxgap',
'minoverlap',
'type', 'ignore.strand'), all of them
documented (and
with most
users more or less familiar with them
already)
has the
virtue to
expose the user to all the options from the
very start,
and to
help him/her make the right choice. Of
course
there
will be users
that don't want or don't have the time to
read/think
about all the
options. Not a big deal: they'll just do
'overlapsAny(query, subject)',
which is not a lot more typing than 'query
%in%
subject', especially
if they use tab completion.
It's true that it's more common to ask
questions about
overlap than
about equality but there are some use cases
for the
latter (as the
original thread shows). Until now, when you
had such a
use case, you
could not use match() or %in%, which would
have been
the natural things
to use, because they got hijacked to do
something else,
and you were
left with nothing. Not a satisfying
situation.
So at a
minimum, we
needed to restore the true/real/original
semantic of
match() to do
"equality" instead of "overlap". But it's
hard
to do
this for match()
and not do it for %in% too. For more than
99% of R
users, %in% is
just a simple wrapper for 'match(x, table,
nomatch = 0)
> 0' (this
is how it has been documented and
implemented
in base R
for many
years). Not maintaining this relationship
between %in%
and match()
would only cause grief and frustration to
newcomers to
Bioconductor.
H.
On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
Hiya again,
I am definitely a late comer to BioC, so
I
definitely easily
defer to
the tide of history.
But I do think you miss my point Michael
about the
proposed change
making the relationship between %in% and
match for
{G,I}Ranges{List}
mimic that between other vectors, and I
do
think
that changing
the API
would make other late-comers take to
BioC
easier/faster.
That said, I NEVER use %in% so I really
have no
stake in the
matter, and
I DEFINITELY appreciate the argument to
not
changing the API
just for
sematic sweetness.
That that said, Herve is _/so good/_
about
deprecations and warnings
that make such changes fairly easily
digestible.
That that that.... enough.... I bow out
of
this
one....!!!!
Always learning and Happy New Year to
all
lurkers,
~Malcolm
*From:*Michael Lawrence
[mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>>.
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>>._
_>____com
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>.
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>.>____com
<mailto:lawrence.michael@gene.<mailto:lawrence.michael@gene.>__com
<mailto:lawrence.michael@gene.com<mailto:lawrence.michael@gene .com="">>>>]
*Sent:* Friday, January 04, 2013 5:11 PM
*To:* Cook, Malcolm
*Cc:* Sean Davis; Michael Lawrence;
Hervé
Pagès
(hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>>); Tim
Triche, Jr.; Vedran Franke;
bioconductor@r-project.org<mailto:bioconductor@r-project.org>
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
<mailto:bioconductor@r-__project.org<mailto:bioconductor@r-__p roject.org="">
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-proje ct.org="">>>
<mailto:bioconductor@r-____project.org<m ailto:bioconductor@r-____project.org="">
<mailto:bioconductor@r-__project.org<mailto:bioconductor@r-__p roject.org="">>
<mailto:bioconductor@r-__project.org<mailto:bioco nductor@r-__project.org="">
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-proje ct.org="">>>>
*Subject:* Re: [BioC] countMatches()
(was:
table
for GenomicRanges)
On Fri, Jan 4, 2013 at 1:56 PM, Cook,
Malcolm
<mec@stowers.org<mailto:mec@stowers.org>
<mailto:mec@stowers.org<mailto:mec@stowers.org>>
<mailto:mec@stowers.org<mailto:mec@stowers.org>
<mailto:mec@stowers.org<mailto:mec@stowers.org>>>
<mailto:mec@stowers.org<mailto:mec@stowers.org>
<mailto:mec@stowers.org<mailto:mec@stowers.org>>
<mailto:mec@stowers.org<mailto:mec@stowers.org>
<mailto:mec@stowers.org<mailto:mec@stowers.org>>>>
<mailto:mec@stowers.org<mailto:mec@stowers.org>
<mailto:mec@stowers.org<mailto:mec@stowers.org>>
<mailto:mec@stowers.org<mailto:mec@stowers.org>
<mailto:mec@stowers.org<mailto:mec@stowers.org>>>
<mailto:mec@stowers.org<mailto:mec@stowers.org>
<mailto:mec@stowers.org<mailto:mec@stowers.org>>
<mailto:mec@stowers.org<mailto:mec@stowers.org>
<mailto:mec@stowers.org<mailto:mec@stowers.org>>>>>> wrote:
Hiya,
For what it is worth...
I think the change to %in% is warranted.
If I understand correctly, this change
restores the
relationship
between
the semantics of `%in` and the semantics
of `match`.
From the docs:
'"%in%" <- function(x, table)
match(x,
table,
nomatch = 0) > 0'
Herve's change restores this
relationship.
match and %in% were initially consistent
(both
considering any
overlap);
Herve has changed both of them together.
The whole
idea behind
IRanges
is that ranges are special data types
with
special
semantics. We
have
reimplemented much of the existing R
vector API
using those
semantics;
this extends beyond match/%in%. I am
hesitant about
making such
sweeping
changes to the API so late in the
life-cycle of the
package.
There was a
feature request for a way to count
identical ranges
in a set of
ranges.
Let's please not get carried away and
start
redesigning the API
for this
one, albeit useful, request. There are
all
sorts of
inconsistencies in
the API, and many of them were conscious
decisions
that considered
practical use cases.
Michael
Herve, I suspect you were you as a
result able to
completely drop
all the
`%in%,BiocClass1,BiocClass2`
definitions and depend
upon
base::%in%
Am I right?
If so, may I suggest that Herve
stay the
course, with the
addition of
'"%ol%" <- function(a, b)
findOverlaps(a,
b, maxgap=0L,
minoverlap=1L, type='any',
select='all') > 0'
This would provide a perspicacious
idiom, thereby
optimizing the API
for Michaels observed common use
case.
Just sayin'
~Malcolm
.-----Original Message-----
.From:
bioconductor-bounces@r-______project.org<mailto:bioconductor- bounces@r-______project.org="">
<mailto:bioconductor-bounces@r-____project.org<mailto :bioconductor-bounces@r-____project.org="">>
<mailto:bioconductor-bounces@<mailto :bioconductor-bounces@="">__r-__project.org<http: r-__project.org="">
<mailto:bioconductor-bounces@r-__project.org<mailto :bioconductor-bounces@r-__project.org="">>>
<mailto:bioconductor-bounces@<mailto :bioconductor-bounces@="">
<mailto:bioconductor-bounces@<mailto:bioconductor-bounces @="">>____r-project.org<http: r-project.org="">
<http: r-project.org="">
<mailto:bioconductor-bounces@<mailto :bioconductor-bounces@="">__r-project.org<http: r-project.org="">
<mailto:bioconductor-bounces@r-project.org<mailto :bioconductor-bounces@r-project.org="">>>>
<mailto:bioconductor- bounces@<mailto:bioconductor-bounces@="">
<mailto:bioconductor-bounces@<mailto:bioconductor-bounces@>>
<mailto:bioconductor-bounces@<mailto :bioconductor-bounces@="">
<mailto:bioconductor-bounces@<mailto:bioconductor-bounces @="">>>______r-project.org<http: r-project.org="">
<http: r-project.org="">
<http: r-project.org="">
<mailto:bioconductor-bounces@<mailto :bioconductor-bounces@="">
<mailto:bioconductor-bounces@<mailto:bioconductor-bounces @="">>____r-project.org<http: r-project.org="">
<http: r-project.org="">
<mailto:bioconductor-bounces@<mailto :bioconductor-bounces@="">__r-project.org<http: r-project.org="">
<mailto:bioconductor-bounces@r-project.org<mailto :bioconductor-bounces@r-project.org="">>>>>
[mailto:bioconductor-
bounces@<mailto:bioconductor-bounces@>
<mailto:bioconductor-bounces@<mailto:bioconductor-bounces@>>
<mailto:bioconductor-bounces@<mailto :bioconductor-bounces@="">
<mailto:bioconductor-bounces@<mailto:bioconductor-bounces @="">>>______r-project.org<http: r-project.org="">
<http: r-project.org="">
<http: r-project.org="">
<mailto:bioconductor-bounces@<mailto :bioconductor-bounces@="">
<mailto:bioconductor-bounces@<mailto:bioconductor-bounces @="">>____r-project.org<http: r-project.org="">
<http: r-project.org="">
<mailto:bioconductor-bounces@<mailto :bioconductor-bounces@="">__r-project.org<http: r-project.org="">
<mailto:bioconductor-bounces@r-project.org<mailto :bioconductor-bounces@r-project.org="">>>>
<mailto:bioconductor- bounces@<mailto:bioconductor-bounces@="">
<mailto:bioconductor-bounces@<mailto:bioconductor-bounces@>>
<mailto:bioconductor-bounces@<mailto :bioconductor-bounces@="">
<mailto:bioconductor-bounces@<mailto:bioconductor-bounces @="">>>______r-project.org<http: r-project.org="">
<http: r-project.org="">
<http: r-project.org="">
<mailto:bioconductor-bounces@<mailto :bioconductor-bounces@="">
<mailto:bioconductor-bounces@<mailto:bioconductor-bounces @="">>____r-project.org<http: r-project.org="">
<http: r-project.org="">
<mailto:bioconductor-bounces@<mailto :bioconductor-bounces@="">__r-project.org<http: r-project.org="">
<mailto:bioconductor-bounces@r-project.org<mailto :bioconductor-bounces@r-project.org="">>>>>] On Behalf Of Sean
Davis
.Sent: Friday, January 04, 2013
3:37 PM
.To: Michael Lawrence
.Cc: Tim Triche, Jr.; Vedran
Franke;
bioconductor@r-project.org<mailto:bioconductor@r-project.org>
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
<mailto:bioconductor@r-__project.org<mailto:bioco nductor@r-__project.org="">
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-proje ct.org="">>>
<mailto:bioconductor@r-____project.org<mailto:bio conductor@r-____project.org="">
<mailto:bioconductor@r-__project.org<mailto:bioconductor@r-__p roject.org="">>
<mailto:bioconductor@r-__project.org<mailto:bioco nductor@r-__project.org="">
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-proje ct.org="">>>>
<mailto:bioconductor@r-______project.org<mailto:bioconductor@r -______project.org="">
<mailto:bioconductor@r-____project.org<mailto:bioconductor@r-_ ___project.org="">>
<mailto:bioconductor@r-____project.org<mailto:bio conductor@r-____project.org="">
<mailto:bioconductor@r-__project.org<mailto:bioconductor@r-__p roject.org="">>>
<mailto:bioconductor@r-____project.org<m ailto:bioconductor@r-____project.org="">
<mailto:bioconductor@r-__project.org<mailto:bioconductor@r-__p roject.org="">>
<mailto:bioconductor@r-__project.org<mailto:bioco nductor@r-__project.org="">
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-proje ct.org="">>>>>
.Subject: Re: [BioC]
countMatches()
(was:
table for
GenomicRanges)
.
.On Fri, Jan 4, 2013 at 4:32 PM,
Michael
Lawrence
.<lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>
<mailto:lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>>
<mailto:lawrence.michael@gene.<mailto:lawrence.michael@gene.>__com
<mailto:lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>>>
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>.
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>.>____com
<mailto:lawrence.michael@gene.<mailto:lawrence.michael@gene.>__com
<mailto:lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>>>>
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>>.
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>>._
_>____com
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>.
<mailto:lawrence.michael@gene<mailto:lawrence.michael@gene>.>____com
<mailto:lawrence.michael@gene.<mailto:lawrence.michael@gene.>__com
<mailto:lawrence.michael@gene.com<mailto:lawrence.michael@gene .com="">>>>>> wrote:
.> The change to the behavior of
%in% is a
pretty big
one. Are you
thinking
.> that all set-based operations
should
behave this way? For
example, setdiff
.> and intersect? I really liked
the syntax
of "peaks
%in% genes".
In my
.> experience, it's way more
common
to ask
questions
about overlap
than about
.> equality, so I'd rather
optimize
the API
for that use
case. But
again,
.> that's just my personal bias.
.
.For what it is worth, I share
Michael's
personal bias here.
.
.Sean
.
.
.> Michael
.>
.>
.> On Fri, Jan 4, 2013 at 1:11
PM,
Hervé Pagès
<hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>>>> wrote:
.>
.>> Hi,
.>>
.>> I added findMatches() and
countMatches()
to the
latest IRanges /
.>> GenomicRanges packages (in
BioC
devel only).
.>>
.>> findMatches(x, table): An
enhanced
version of
'match' that
.>> returns all the
matches in a
Hits object.
.>>
.>> countMatches(x, table):
Returns an
integer vector
of the length
.>> of 'x', containing
the number
of matches in
'table' for
.>> each element in
'x'.
.>>
.>> countMatches() is what you
can
use to
tally/count/tabulate
(choose your
.>> preferred term) the unique
elements in a
GRanges object:
.>>
.>> library(GenomicRanges)
.>> set.seed(33)
.>> gr <- GRanges("chr1",
IRanges(sample(15,20,replace=*______*TRUE),
width=5))
.>>
.>> Then:
.>>
.>> > gr_levels <-
sort(unique(gr))
.>> > countMatches(gr_levels,
gr)
.>> [1] 1 1 1 2 4 2 2 1 2 2 2
.>>
.>> Note that findMatches() and
countMatches() also work on
IRanges and
.>> DNAStringSet objects, as well
as on
ordinary atomic
vectors:
.>>
.>> library(hgu95av2probe)
.>> library(Biostrings)
.>> probes <-
DNAStringSet(hgu95av2probe)
.>> unique_probes <-
unique(probes)
.>> count <-
countMatches(unique_probes,
probes)
.>> max(count) # 7
.>>
.>> I made other changes in
IRanges/GenomicRanges so that
the notion
.>> of "match" between elements
of a
vector-like object now
consistently
.>> means "equality" instead of
"overlap",
even for
range-based
objects
.>> like IRanges or GRanges
objects. This
notion of
"equality" is the
.>> same that is used by ==. The
most
visible consequence
of those
.>> changes is that using %in%
between 2
IRanges or
GRanges objects
.>> 'query' and 'subject' in
order
to do
overlaps was
replaced by
.>> overlapsAny(query, subject).
.>>
.>> overlapsAny(query,
subject):
Finds the
ranges in
'query' that
.>> overlap any of the
ranges
in 'subject'.
.>>
.>> There are warnings and
deprecation
messages in place
to help
smooth
.>> the transition.
.>>
.>> Cheers,
.>> H.
.>>
.>> --
.>> Hervé Pagès
.>>
.>> Program in Computational
Biology
.>> Division of Public Health
Sciences
.>> Fred Hutchinson Cancer
Research
Center
.>> 1100 Fairview Ave. N, M1-B514
.>> P.O. Box 19024
.>> Seattle, WA 98109-1024
.>>
.>> E-mail:
hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>>>
.>> Phone: (206)
667-5791<tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
.>> Fax: (206)
667-1319<tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
.>>
.>
.> [[alternative HTML
version deleted]]
.>
.>
.>
_____________________________________________________
.> Bioconductor mailing list
.>
Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
<mailto:bioconductor@r-__project.org<mailto:bioco nductor@r-__project.org="">
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-proje ct.org="">>>
<mailto:bioconductor@r-____project.org<m ailto:bioconductor@r-____project.org="">
<mailto:bioconductor@r-__project.org<mailto:bioconductor@r-__p roject.org="">>
<mailto:bioconductor@r-__project.org<mailto:bioco nductor@r-__project.org="">
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-proje ct.org="">>>>
<mailto:bioconductor@r-______project.org <mailto:bioconductor@r-______project.org="">
<mailto:bioconductor@r-____project.org<mailto:bioconductor@r-_ ___project.org="">>
<mailto:bioconductor@r-____project.org<mailto:bio conductor@r-____project.org="">
<mailto:bioconductor@r-__project.org<mailto:bioconductor@r-__p roject.org="">>>
<mailto:bioconductor@r-____project.org<m ailto:bioconductor@r-____project.org="">
<mailto:bioconductor@r-__project.org<mailto:bioconductor@r-__p roject.org="">>
<mailto:bioconductor@r-__project.org<mailto:bioco nductor@r-__project.org="">
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-proje ct.org="">>>>>
.>
https://stat.ethz.ch/mailman/______listinfo/bioconductor
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="" <https:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="" <https:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="" <https:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
.> Search the archives:
http://news.gmane.org/gmane.______science.biology.informatics.
______conductor
<http: news.gmane.org="" gmane.____science.biology.informatics._="" ___conductor="">
<http: news.gmane.org="" gmane.____science.biology.informatics._="" ___conductor="" <http:="" news.gmane.org="" gmane.__science.biology.informatics.__c="" onductor="">>
<http: news.gmane.org="" gmane.____science.biology.informatics._="" ___conductor="" <http:="" news.gmane.org="" gmane.__science.biology.informatics.__c="" onductor="">
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="" <http:="" news.gmane.org="" gmane.science.biology.informatics.conductor="">>>
.
._____________________________________________________
.Bioconductor mailing list
.Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
<mailto:bioconductor@r-__project.org<mailto:bioco nductor@r-__project.org="">
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-proje ct.org="">>>
<mailto:bioconductor@r-____project.org<m ailto:bioconductor@r-____project.org="">
<mailto:bioconductor@r-__project.org<mailto:bioconductor@r-__p roject.org="">>
<mailto:bioconductor@r-__project.org<mailto:bioco nductor@r-__project.org="">
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-proje ct.org="">>>>
<mailto:bioconductor@r-______project.org <mailto:bioconductor@r-______project.org="">
<mailto:bioconductor@r-____project.org<mailto:bioconductor@r-_ ___project.org="">>
<mailto:bioconductor@r-____project.org<mailto:bio conductor@r-____project.org="">
<mailto:bioconductor@r-__project.org<mailto:bioconductor@r-__p roject.org="">>>
<mailto:bioconductor@r-____project.org<m ailto:bioconductor@r-____project.org="">
<mailto:bioconductor@r-__project.org<mailto:bioconductor@r-__p roject.org="">>
<mailto:bioconductor@r-__project.org<mailto:bioco nductor@r-__project.org="">
<mailto:bioconductor@r-project.org<mailto:bioconductor@r-proje ct.org="">>>>>
.https://stat.ethz.ch/mailman/______listinfo/bioconductor
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="" <https:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="" <https:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="" <https:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
.Search the archives:
http://news.gmane.org/gmane.______science.biology.informatics.
______conductor
<http: news.gmane.org="" gmane.____science.biology.informatics._="" ___conductor="">
<http: news.gmane.org="" gmane.____science.biology.informatics._="" ___conductor="" <http:="" news.gmane.org="" gmane.__science.biology.informatics.__c="" onductor="">>
<http: news.gmane.org="" gmane.____science.biology.informatics._="" ___conductor="" <http:="" news.gmane.org="" gmane.__science.biology.informatics.__c="" onductor="">
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="" <http:="" news.gmane.org="" gmane.science.biology.informatics.conductor="">>>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail:
hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>>
Phone: (206)
667-5791<tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
Fax: (206)
667-1319<tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>
<mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>>
Phone: (206) 667-5791<tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
Fax: (206) 667-1319<tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
--
/A model is a lie that helps you see the truth./
/
/
Howard Skipper
<http: cancerres.__aacrjournals.org="" content="" 31="" 9="" __1173.full.="" pdf<http:="" aacrjournals.org="" content="" 31="" 9="" __1173.full.pdf="">
<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
...
[Message clipped]
--
A model is a lie that helps you see the truth.
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
I think %over% and maybe %within% are all that's needed. Could go to
%start% and %end%.
Michael
On Tue, Jan 8, 2013 at 2:59 PM, Cook, Malcolm <mec@stowers.org> wrote:
> If were voting/brainstorming, Id go for one operator for value
that the
> type arg of overlap can take on****
>
> ** **
>
> Thus:****
>
> ** **
>
> %olStart%****
>
> %olEnd%****
>
> %olWithin%****
>
> %olAny% (perhaps with alias of just %ol%)****
>
> %olEqual% (which should be same as %in%, right)****
>
> ** **
>
> Doh, I cant stay away from this issue for some reason..... Anyway,
my 2
> cents****
>
> ** **
>
> ~Malcolm****
>
> ** **
>
> *From:* Tim Triche, Jr. [mailto:tim.triche@gmail.com]
> *Sent:* Tuesday, January 08, 2013 4:12 PM
> *To:* Michael Lawrence
> *Cc:* Hervé Pagès; Cook, Malcolm; Sean Davis; Vedran Franke;
> bioconductor@r-project.org
> *Subject:* Re: [BioC] countMatches() (was: table for
GenomicRanges)****
>
> ** **
>
> Michael: your suggestion is both clearer and more concise than mine
was.
> +1 ****
>
> ** **
>
> (I prefer x %i% y %i% z rather than intersect(x, intersect(y, z))
for the
> same reason)****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> On Tue, Jan 8, 2013 at 2:03 PM, Michael Lawrence <
> lawrence.michael@gene.com> wrote:****
>
> I would vote for %over% instead of %ov%. Just 2 more characters but
way
> clearer, at least to me. The hardest thing to type are the %'s.
>
> Michael****
>
> ** **
>
> On Tue, Jan 8, 2013 at 11:09 AM, Hervé Pagès <hpages@fhcrc.org>
wrote:****
>
> Thanks Tim, Malcolm for the feedback.
>
> @Tim, I won't comment on the variants of %ov% you are proposing for
> doing "within" or "equal" instead of "any" (but if people want them,
> I'll add them too). For now I just want to focus on restoring the
> convenience of the old %in%, whose removal is understandably causing
> some frustration. And so we can move on.
>
> Cheers,
> H.****
>
>
>
>
> On 01/08/2013 09:50 AM, Tim Triche, Jr. wrote:****
>
> hell, I'll add the operators if there's support for them. obviously
> they're not a big deal and a patch would take 5 minutes flat.
>
> my hope was to be very explicit about what each type of operation
meant,
> so that when a newcomer to the Ranges API sees
>
> peaks %overlapping% promoters(someGroupOfGenesWeCareAbout)
>
> it cannot be confused with
>
> peaks %within% rangesThatCorrespondToSomeChromatinState
>
> or
>
> peaks %equal% aBunchOfDNAseFootprints
>
> or
>
> DMRs %in% genes ## what the hell does this really mean, anyways?
> it's so bad on so many levels
>
> because whenever someone says "what is the advantage of Ranges-based
> analyses?", these are the archetypal sorts of queries that come to
mind.
> Except that usually in my examples they are based on posterior
> probabilities, but perhaps that could stand to change.
>
> Anyways, that's just my bias, and you're doing the heavy lifting.
But
> if people agree with the motivations I will write the patch today.
>
> Cheers,
>
> --t
>
>
>
>
> On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès <hpages@fhcrc.org****>
> <mailto:hpages@fhcrc.org>> wrote:
>
> Hi Tim,
>
> I could add the %ov% operator as a replacement for the old %in%.
So you
> would write 'peaks %ov% genes' instead of 'peaks %in% genes'.
Would
> just
> be a convenience wrapper for 'overlapsAny(peaks, genes)'.
>
> Cheers,
> H.
>
>
> On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
>
> So why not leave %in% as it was and transition everything
forward
> to
> explicitly using { `%within%`,
`%overlaps%`|`%overlapping%`,
> `%equals%`
> } such that
>
> identical( x %within% table, countOverlaps(x, table,
> type='within') >
> 0 ) == TRUE
> identical( x %overlaps% table, countOverlaps(x, table,
> type='any') >
> 0 ) == TRUE
> identical( x %equals% table, countOverlaps(x, table,
> type='equal') >
> 0 ) == TRUE
>
> and for the time being,
>
> identical( x %overlaps% table, countOverlaps(x, table,
> type='any') >
> 0 ) == TRUE ## but with a noisy nastygram that will halt if
> options("warn"=2)
> No breakage for %in% methods until such time as a full
> deprecation cycle
> has passed, and if the maintainers can't be arsed to do
anything
> at all
> about the warnings by the second full release, then perhaps
they
> don't
> really care that much after all. Just a thought?
>
> From someone (me) who has their own issues with keeping
> everything up
> to date and should know better. If you want to use %in% for
>
> peaks %in% genes (why on earth would you do this rather
than
> peaks
> %in% promoters(genes), anyways?)
>
> then a nastygram could be emitted "WARNING: YOUR SHORTHAND
> NOTATION IS
> DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and
everyone is
> (more
> or less) happy.
>
>
>
> On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
> <lawrence.michael@gene.com <mailto:lawrence.michael@gene.com="">****
>
> <mailto:lawrence.michael@gene.__com> <mailto:lawrence.michael@gene.com>>> wrote:
>
>
>
> ****
>
> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès
> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">****
>
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
wrote:
>
> Hi Michael,
>
> I don't think "match" (the word) always has to mean
> "equality"
> either.
> However having match() (the function) do "whole
exact
> matching" (aka
> "equality") for any kind of vector-like object has
the
> advantage of:
>
> (a) making it consistent with base::match()
> (?base::match is
> pretty
> explicit about what the contract of match()
is)
>
>
> (a) alone is obviously not enough. We have many
methods,
> like the
> set operations, that treat ranges specially. Are we
going
> to start
> moving everything toward the base behavior? And have
> rangeIntersect,
> rangeSetdiff, etc?
>
> (b) preserving its relationship with ==,
> duplicated(), unique(),
> etc...
>
>
> So it becomes consistent with duplicated/unique, but we
lose
> consistency with the set operations.
>
> (c) not frustrating the user who needs something
to
> do exact
> matching on ranges (as I mentioned
previously,
> if you take
> match() away from him/her, s/he'll be left
with
> nothing).
>
>
> No one has ever asked for match() to behave this way.
There
> was a
> request for a way to tabulate identical ranges. It was
a
> nice idea
> to extract the general "outer equal" findMatches
function.
> But the
> changes seem to be snow-balling. These types of
changes
> mean a lot
> of maintenance work for the users. A deprecation cycle
does
> not
> circumvent that.
>
>
> IMO those advantages counterbalance *by far* the
very
> little
> convenience you get from having 'match(query,
subject)' do
> 'findOverlaps(query, subject, select="first")' on
> IRanges/GRanges objects. If you need to do that,
just
> use the
> latter, or, if you think that's still too much
typing,
> define
> a wrapper e.g. 'ovmatch(query, subject)'.
>
> There are plenty of specialized tools around for
doing
> inexact/fuzzy/partial/overlap matching for many
> particular types
> of vector-like objects: grep() and family,
pmatch(),
> charmatch(),
> agrep(), grepRaw(), matchPattern() and family,
> findOverlaps() and
> family, findIntervals(), etc... For the reasons I
> mentioned
> above, none of them should hijack match() to make
it do
> some
> particular type of inexact matching on some
particular
> type of
> objects. Even if, for that particular type of
objects,
> doing that
> particular type of inexact matching is more common
than
> doing
> exact matching.
>
> H.
>
>
>
> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>
> I think having overlapsAny is a nice addition
and
> helps make
> the API
> more complete and explicit. Are you sure we
need to
> change
> the behavior
> of the match method for this relatively
uncommon
> use case?
>
>
> Yes because otherwise users with a use case of
doing
> match()
>
> even if it's uncommon,
>
>
> I don't think
> "match" always has to mean "equality". It is a
more
> general
> concept in
> my mind. The most common use case for matching
> ranges is
> overlap.
>
>
> Of course "match" doesn't always have to mean
equality.
> But of base
>
>
> Michael
>
>
> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>****
>
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>>
wrote:****
>
> Yes 'peaks %in% genes' is cute and was
> probably doing
> the right thing
> for most users (although not all). But
'exons
> %in%
> genes' is cute too
> and was probably doing the wrong thing
for
> all users.
> Advanced users
> like you guys would have no problem
switching to
>
> !is.na <http: is.na="">
<http: is.na="">****
>
> <http: is.na="">(findOverlaps(____peaks,
genes,****
>
>
> type="within",
>
> select="any"))
>
> or
>
> !is.na <http: is.na="">
<http: is.na="">****
>
> <http: is.na="">(findOverlaps(____peaks,
genes,****
>
>
> type="equal",
>
>
> select="any"))
>
> in case 'peaks %in% genes' was not doing
> exactly what
> you wanted,
> but most users would not find this
particularly
> friendly. Even
> worse, some users probably didn't realize
that
> 'peaks
> %in% genes'
> was not doing exactly what they thought it
did
> because
> "peaks in
> genes" in English suggests that the peaks
are
> within
> the genes,
> but it's not what 'peaks %in% genes' does.
>
> Having overlapsAny(), with exactly the
same extra
> arguments as
> countOverlaps() and subsetByOverlaps()
(i.e.
> 'maxgap',
> 'minoverlap',
> 'type', 'ignore.strand'), all of them
> documented (and
> with most
> users more or less familiar with them
already)
> has the
> virtue to
> expose the user to all the options from
the
> very start,
> and to
> help him/her make the right choice. Of
course
> there
> will be users
> that don't want or don't have the time to
> read/think
> about all the
> options. Not a big deal: they'll just do
> 'overlapsAny(query, subject)',
> which is not a lot more typing than 'query
%in%
> subject', especially
> if they use tab completion.
>
> It's true that it's more common to ask
> questions about
> overlap than
> about equality but there are some use
cases
> for the
> latter (as the
> original thread shows). Until now, when
you
> had such a
> use case, you
> could not use match() or %in%, which would
> have been
> the natural things
> to use, because they got hijacked to do
> something else,
> and you were
> left with nothing. Not a satisfying
situation.
> So at a
> minimum, we
> needed to restore the true/real/original
> semantic of
> match() to do
> "equality" instead of "overlap". But it's
hard
> to do
> this for match()
> and not do it for %in% too. For more than
99% of
> R
> users, %in% is
> just a simple wrapper for 'match(x, table,
> nomatch = 0)
> > 0' (this
> is how it has been documented and
implemented
> in base R
> for many
> years). Not maintaining this relationship
> between %in%
> and match()
> would only cause grief and frustration to
> newcomers to
> Bioconductor.
>
> H.
>
>
>
> On 01/04/2013 03:32 PM, Cook, Malcolm
wrote:
>
> Hiya again,
>
> I am definitely a late comer to BioC,
so I
> definitely easily
> defer to
> the tide of history.
>
> But I do think you miss my point
Michael
> about the
> proposed change
> making the relationship between %in%
and
> match for
> {G,I}Ranges{List}
> mimic that between other vectors, and
I do
> think
> that changing
> the API
> would make other late-comers take to
BioC
> easier/faster.
>
> That said, I NEVER use %in% so I
really
> have no
> stake in the
> matter, and
> I DEFINITELY appreciate the argument
to not
> changing the API
> just for
> sematic sweetness.
>
> That that said, Herve is _/so good/_
about
> deprecations and warnings
>
> that make such changes fairly easily
> digestible.
>
> That that that.... enough.... I bow
out of
> this
> one....!!!!
>
> Always learning and Happy New Year to
all
> lurkers,
>
> ~Malcolm
>
> *From:*Michael Lawrence****
>
> [mailto:lawrence.michael@gene
> <mailto:lawrence.michael@gene>.
> <mailto:lawrence.michael@gene> <mailto:lawrence.michael@gene>.__>____com
>
> ****
>
> <mailto:lawrence.michael@gene.> <mailto:lawrence.michael@gene.>____com
> <mailto:lawrence.michael@gene.__com> <mailto:lawrence.michael@gene.com>>>]
> *Sent:* Friday, January 04, 2013 5:11
PM
> *To:* Cook, Malcolm
> *Cc:* Sean Davis; Michael Lawrence;
Hervé
> Pagès
> (hpages@fhcrc.org
> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>>****
>
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">***
> *
>
>
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>); Tim
>
>
>
> Triche, Jr.; Vedran Franke;
> bioconductor@r-project.org
<mailto:bioconductor@r-project.org>
> <mailto:bioconductor@r-__project.org> <mailto:bioconductor@r-project.org>>****
>
> <mailto:bioconductor@r-____project.org> <mailto:bioconductor@r-__project.org>
>
> <mailto:bioconductor@r-__project.org> <mailto:bioconductor@r-project.org>>>****
>
> *Subject:* Re: [BioC] countMatches()
(was:
> table
> for GenomicRanges)
>
>
> On Fri, Jan 4, 2013 at 1:56 PM, Cook,
Malcolm
> <mec@stowers.org <mailto:mec@stowers.org="">
> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>
> <mailto:mec@stowers.org> <mailto:mec@stowers.org> <mailto:mec@stowers.org> <mailto:mec@stowers.org>>>
> <mailto:mec@stowers.org> <mailto:mec@stowers.org> <mailto:mec@stowers.org> <mailto:mec@stowers.org>>
> <mailto:mec@stowers.org <mailto:mec@stowers.org="">
> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>>>> wrote:
>
> Hiya,
>
> For what it is worth...
>
> I think the change to %in% is
warranted.
>
> If I understand correctly, this change
> restores the
> relationship
> between
> the semantics of `%in` and the
semantics
> of `match`.
>
> From the docs:
>
> '"%in%" <- function(x, table)
match(x,
> table,
> nomatch = 0) > 0'
>
> Herve's change restores this
relationship.
>
>
> match and %in% were initially
consistent
> (both
> considering any
> overlap);
> Herve has changed both of them
together.
> The whole
> idea behind
> IRanges
> is that ranges are special data types
with
> special
> semantics. We
> have
> reimplemented much of the existing R
> vector API
> using those
> semantics;
> this extends beyond match/%in%. I am
> hesitant about
> making such
> sweeping
> changes to the API so late in the
> life-cycle of the
> package.
> There was a
> feature request for a way to count
> identical ranges
> in a set of
> ranges.
> Let's please not get carried away and
start
> redesigning the API
> for this
> one, albeit useful, request. There are
all
> sorts of
> inconsistencies in
> the API, and many of them were
conscious
> decisions
> that considered
> practical use cases.
>
> Michael
>
>
> Herve, I suspect you were you as
a
> result able to
> completely drop
> all the
`%in%,BiocClass1,BiocClass2`
> definitions and depend
> upon
> base::%in%
>
> Am I right?
>
> If so, may I suggest that Herve
stay the
> course, with the
> addition of
> '"%ol%" <- function(a, b)
> findOverlaps(a,
> b, maxgap=0L,
> minoverlap=1L, type='any',
> select='all') > 0'
>
> This would provide a
perspicacious
> idiom, thereby
> optimizing the API
> for Michaels observed common use
case.
>
> Just sayin'
>
> ~Malcolm
>
>
> .-----Original Message-----
> .From:****
>
> bioconductor-bounces@r-______project.org
> <mailto:bioconductor-bounces@r-____project.org>
> <mailto:bioconductor-bounces@__r-__project.org> <mailto:bioconductor-bounces@r-__project.org>>****
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces@__r-project.org> <mailto:bioconductor-bounces@r-project.org>>>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>****
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">****
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces@__r-project.org> <mailto:bioconductor-bounces@r-project.org>>>>
> [mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>****
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">****
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces@__r-project.org> <mailto:bioconductor-bounces@r-project.org>>>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>****
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">****
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces@__r-project.org> <mailto:bioconductor-bounces@r-project.org>>>>] On Behalf Of
Sean
> Davis
> .Sent: Friday, January 04, 2013
3:37
> PM
> .To: Michael Lawrence
> .Cc: Tim Triche, Jr.; Vedran
Franke;
> bioconductor@r-project.org
<mailto:bioconductor@r-project.org>
> <mailto:bioconductor@r-__project.org> <mailto:bioconductor@r-project.org>>
> <mailto:bioconductor@r-____project.org> <mailto:bioconductor@r-__project.org>
> <mailto:bioconductor@r-__project.org> <mailto:bioconductor@r-project.org>>>****
>
> <mailto:bioconductor@r-______project.org> <mailto:bioconductor@r-____project.org>****
>
>
> <mailto:bioconductor@r-____project.org> <mailto:bioconductor@r-__project.org>>
>
>
> <mailto:bioconductor@r-____project.org> <mailto:bioconductor@r-__project.org>
> <mailto:bioconductor@r-__project.org> <mailto:bioconductor@r-project.org>>>>
>
> .Subject: Re: [BioC]
countMatches()
> (was:
> table for
> GenomicRanges)
> .
> .On Fri, Jan 4, 2013 at 4:32
PM,
> Michael
> Lawrence
> .<lawrence.michael@gene.com> <mailto:lawrence.michael@gene.com>
> <mailto:lawrence.michael@gene.__com> <mailto:lawrence.michael@gene.com>>
> <mailto:lawrence.michael@gene.> <mailto:lawrence.michael@gene.>____com
> <mailto:lawrence.michael@gene.__com> <mailto:lawrence.michael@gene.com>>>****
>
> <mailto:lawrence.michael@gene> <mailto:lawrence.michael@gene>.
> <mailto:lawrence.michael@gene> <mailto:lawrence.michael@gene>.__>____com****
>
> <mailto:lawrence.michael@gene.> <mailto:lawrence.michael@gene.>____com
> <mailto:lawrence.michael@gene.__com> <mailto:lawrence.michael@gene.com>>>>> wrote:
> .> The change to the behavior
of
> %in% is a
> pretty big
> one. Are you
> thinking
> .> that all set-based
operations
> should
> behave this way? For
> example, setdiff
> .> and intersect? I really
liked
> the syntax
> of "peaks
> %in% genes".
> In my
> .> experience, it's way more
common
> to ask
> questions
> about overlap
> than about
> .> equality, so I'd rather
optimize
> the API
> for that use
> case. But
> again,
> .> that's just my personal
bias.
> .
> .For what it is worth, I share
> Michael's
> personal bias here.
> .
> .Sean
> .
> .
> .> Michael
> .>
> .>
> .> On Fri, Jan 4, 2013 at 1:11
PM,
> Hervé Pagès
> <hpages@fhcrc.org> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>>
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>>>>> wrote:
> .>
> .>> Hi,
> .>>
> .>> I added findMatches() and
> countMatches()
> to the
> latest IRanges /
> .>> GenomicRanges packages (in
BioC
> devel only).
> .>>
> .>> findMatches(x, table): An
> enhanced
> version of
> match that
> .>> returns all the
> matches in a
> Hits object.
> .>>
> .>> countMatches(x, table):
> Returns an
> integer vector
> of the length
> .>> of x,
containing
> the number
> of matches in
> table for
> .>> each element in
x.
> .>>
>
> .>> countMatches() is what you
can
> use to
> tally/count/tabulate
> (choose your
>
> .>> preferred term) the unique
> elements in a
> GRanges object:
> .>>
> .>> library(GenomicRanges)
> .>> set.seed(33)
> .>> gr <- GRanges("chr1",****
>
>
IRanges(sample(15,20,replace=*______*TRUE),*
> ***
>
>
>
>
> width=5))
> .>>
> .>> Then:
> .>>
> .>> > gr_levels <-
sort(unique(gr))
> .>> > countMatches(gr_levels,
gr)
> .>> [1] 1 1 1 2 4 2 2 1 2 2
2
> .>>
> .>> Note that findMatches() and
> countMatches() also work on
> IRanges and
> .>> DNAStringSet objects, as
well as
> on
> ordinary atomic
> vectors:
> .>>
> .>> library(hgu95av2probe)
> .>> library(Biostrings)
> .>> probes <-
> DNAStringSet(hgu95av2probe)
> .>> unique_probes <-
unique(probes)
> .>> count <-
> countMatches(unique_probes,
> probes)
> .>> max(count) # 7
> .>>
> .>> I made other changes in
> IRanges/GenomicRanges so that
> the notion
> .>> of "match" between elements
of a
> vector-like object now
> consistently
> .>> means "equality" instead of
> "overlap",
> even for
> range-based
> objects
> .>> like IRanges or GRanges
> objects. This
> notion of
> "equality" is the
> .>> same that is used by ==.
The most
> visible consequence
> of those
> .>> changes is that using %in%
> between 2
> IRanges or
> GRanges objects
> .>> 'query' and 'subject' in
order
> to do
> overlaps was
> replaced by
> .>> overlapsAny(query,
subject).
> .>>
> .>> overlapsAny(query,
subject):
> Finds the
> ranges in
> query that
> .>> overlap any of the
ranges
> in subject.
> .>>
>
> .>> There are warnings and
deprecation
> messages in place
> to help
> smooth
>
> .>> the transition.
> .>>
> .>> Cheers,
> .>> H.
> .>>
> .>> --
> .>> Hervé Pagès
> .>>
> .>> Program in Computational
Biology
> .>> Division of Public Health
Sciences
> .>> Fred Hutchinson Cancer
Research
> Center
> .>> 1100 Fairview Ave. N,
M1-B514
> .>> P.O. Box 19024
> .>> Seattle, WA 98109-1024
> .>>
> .>> E-mail: hpages@fhcrc.org
> <mailto:hpages@fhcrc.org>
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>*
> ***
>
> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>>
>
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">***
> *
>
> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>>
>
> .>> Phone: (206) 667-5791
> <tel:%28206%29%20667-5791 <%28206%29%20667-5791="">>
> <tel:%28206%29%20667-5791 <%28206%29%20667-5791="">> <
> tel:%28206%29%20667-5791 <%28206%29%20667-5791>>
>
<tel:%28206%29%20667-5791<%28206%29%20667-5791>
> >
> .>> Fax: (206) 667-1319
> <tel:%28206%29%20667-1319 <%28206%29%20667-1319="">>
> <tel:%28206%29%20667-1319 <%28206%29%20667-1319="">> <
> tel:%28206%29%20667-1319 <%28206%29%20667-1319>>
>
<tel:%28206%29%20667-1319<%28206%29%20667-1319>
> >
>
> .>>
> .>
> .> [[alternative HTML
> version deleted]]
> .>
> .>
> .>****
>
>
_____________________________________________________
> ****
>
>
>
>
> .> Bioconductor mailing list
> .> Bioconductor@r-project.org
> <mailto:bioconductor@r-project.org>
> <mailto:bioconductor@r-__project.org> <mailto:bioconductor@r-project.org>>
> <mailto:bioconductor@r-____project.org> <mailto:bioconductor@r-__project.org>
> <mailto:bioconductor@r-__project.org> <mailto:bioconductor@r-project.org>>>****
>
>
<mailto:bioconductor@r-______project.org> <mailto:bioconductor@r-____project.org>****
>
>
> <mailto:bioconductor@r-____project.org> <mailto:bioconductor@r-__project.org>>
>
> <mailto:bioconductor@r-____project.org> <mailto:bioconductor@r-__project.org>
> <mailto:bioconductor@r-__project.org> <mailto:bioconductor@r-project.org>>>>
>
> .>****
>
> https://stat.ethz.ch/mailman/______listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">****
>
>
>
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
>
>
>
>
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
> .> Search the archives:****
>
>
> <http: news.gmane.org="" gmane.______science.biology.informatics._____="" _conductor="">
>
> ...
>
> [Message clipped]
[[alternative HTML version deleted]]
+1
On Tue, Jan 8, 2013 at 3:07 PM, Michael Lawrence
<lawrence.michael@gene.com>wrote:
> I think %over% and maybe %within% are all that's needed. Could go to
> %start% and %end%.
>
> Michael
>
>
>
>
>
> On Tue, Jan 8, 2013 at 2:59 PM, Cook, Malcolm <mec@stowers.org>
wrote:
>
>> If were voting/brainstorming, Id go for one operator for value
that
>> the type arg of overlap can take on****
>>
>> ** **
>>
>> Thus:****
>>
>> ** **
>>
>> %olStart%****
>>
>> %olEnd%****
>>
>> %olWithin%****
>>
>> %olAny% (perhaps with alias of just %ol%)****
>>
>> %olEqual% (which should be same as %in%, right)****
>>
>> ** **
>>
>> Doh, I cant stay away from this issue for some reason..... Anyway,
my 2
>> cents****
>>
>> ** **
>>
>> ~Malcolm****
>>
>> ** **
>>
>> *From:* Tim Triche, Jr. [mailto:tim.triche@gmail.com]
>> *Sent:* Tuesday, January 08, 2013 4:12 PM
>> *To:* Michael Lawrence
>> *Cc:* Hervé Pagès; Cook, Malcolm; Sean Davis; Vedran Franke;
>> bioconductor@r-project.org
>> *Subject:* Re: [BioC] countMatches() (was: table for
GenomicRanges)****
>>
>> ** **
>>
>> Michael: your suggestion is both clearer and more concise than mine
was.
>> +1 ****
>>
>> ** **
>>
>> (I prefer x %i% y %i% z rather than intersect(x, intersect(y, z))
for the
>> same reason)****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> On Tue, Jan 8, 2013 at 2:03 PM, Michael Lawrence <
>> lawrence.michael@gene.com> wrote:****
>>
>> I would vote for %over% instead of %ov%. Just 2 more characters but
way
>> clearer, at least to me. The hardest thing to type are the %'s.
>>
>> Michael****
>>
>> ** **
>>
>> On Tue, Jan 8, 2013 at 11:09 AM, Hervé Pagès <hpages@fhcrc.org>
wrote:***
>> *
>>
>> Thanks Tim, Malcolm for the feedback.
>>
>> @Tim, I won't comment on the variants of %ov% you are proposing for
>> doing "within" or "equal" instead of "any" (but if people want
them,
>> I'll add them too). For now I just want to focus on restoring the
>> convenience of the old %in%, whose removal is understandably
causing
>> some frustration. And so we can move on.
>>
>> Cheers,
>> H.****
>>
>>
>>
>>
>> On 01/08/2013 09:50 AM, Tim Triche, Jr. wrote:****
>>
>> hell, I'll add the operators if there's support for them.
obviously
>> they're not a big deal and a patch would take 5 minutes flat.
>>
>> my hope was to be very explicit about what each type of operation
meant,
>> so that when a newcomer to the Ranges API sees
>>
>> peaks %overlapping% promoters(someGroupOfGenesWeCareAbout)
>>
>> it cannot be confused with
>>
>> peaks %within% rangesThatCorrespondToSomeChromatinState
>>
>> or
>>
>> peaks %equal% aBunchOfDNAseFootprints
>>
>> or
>>
>> DMRs %in% genes ## what the hell does this really mean,
anyways?
>> it's so bad on so many levels
>>
>> because whenever someone says "what is the advantage of Ranges-
based
>> analyses?", these are the archetypal sorts of queries that come to
mind.
>> Except that usually in my examples they are based on posterior
>> probabilities, but perhaps that could stand to change.
>>
>> Anyways, that's just my bias, and you're doing the heavy lifting.
But
>> if people agree with the motivations I will write the patch today.
>>
>> Cheers,
>>
>> --t
>>
>>
>>
>>
>> On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès <hpages@fhcrc.org****>>
>> <mailto:hpages@fhcrc.org>> wrote:
>>
>> Hi Tim,
>>
>> I could add the %ov% operator as a replacement for the old
%in%. So
>> you
>> would write 'peaks %ov% genes' instead of 'peaks %in% genes'.
Would
>> just
>> be a convenience wrapper for 'overlapsAny(peaks, genes)'.
>>
>> Cheers,
>> H.
>>
>>
>> On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
>>
>> So why not leave %in% as it was and transition everything
forward
>> to
>> explicitly using { `%within%`,
`%overlaps%`|`%overlapping%`,
>> `%equals%`
>> } such that
>>
>> identical( x %within% table, countOverlaps(x, table,
>> type='within') >
>> 0 ) == TRUE
>> identical( x %overlaps% table, countOverlaps(x, table,
>> type='any') >
>> 0 ) == TRUE
>> identical( x %equals% table, countOverlaps(x, table,
>> type='equal') >
>> 0 ) == TRUE
>>
>> and for the time being,
>>
>> identical( x %overlaps% table, countOverlaps(x, table,
>> type='any') >
>> 0 ) == TRUE ## but with a noisy nastygram that will halt if
>> options("warn"=2)
>> No breakage for %in% methods until such time as a full
>> deprecation cycle
>> has passed, and if the maintainers can't be arsed to do
anything
>> at all
>> about the warnings by the second full release, then perhaps
they
>> don't
>> really care that much after all. Just a thought?
>>
>> From someone (me) who has their own issues with keeping
>> everything up
>> to date and should know better. If you want to use %in%
for
>>
>> peaks %in% genes (why on earth would you do this rather
than
>> peaks
>> %in% promoters(genes), anyways?)
>>
>> then a nastygram could be emitted "WARNING: YOUR SHORTHAND
>> NOTATION IS
>> DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and
everyone is
>> (more
>> or less) happy.
>>
>>
>>
>> On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
>> <lawrence.michael@gene.com <mailto:lawrence.michael@gene.com="">****
>>
>> <mailto:lawrence.michael@gene.__com>> <mailto:lawrence.michael@gene.com>>> wrote:
>>
>>
>>
>> ****
>>
>> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès
>> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">****
>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
wrote:
>>
>> Hi Michael,
>>
>> I don't think "match" (the word) always has to
mean
>> "equality"
>> either.
>> However having match() (the function) do "whole
exact
>> matching" (aka
>> "equality") for any kind of vector-like object has
the
>> advantage of:
>>
>> (a) making it consistent with base::match()
>> (?base::match is
>> pretty
>> explicit about what the contract of match()
is)
>>
>>
>> (a) alone is obviously not enough. We have many
methods,
>> like the
>> set operations, that treat ranges specially. Are we
going
>> to start
>> moving everything toward the base behavior? And have
>> rangeIntersect,
>> rangeSetdiff, etc?
>>
>> (b) preserving its relationship with ==,
>> duplicated(), unique(),
>> etc...
>>
>>
>> So it becomes consistent with duplicated/unique, but
we lose
>> consistency with the set operations.
>>
>> (c) not frustrating the user who needs
something to
>> do exact
>> matching on ranges (as I mentioned
previously,
>> if you take
>> match() away from him/her, s/he'll be left
with
>> nothing).
>>
>>
>> No one has ever asked for match() to behave this way.
There
>> was a
>> request for a way to tabulate identical ranges. It was
a
>> nice idea
>> to extract the general "outer equal" findMatches
function.
>> But the
>> changes seem to be snow-balling. These types of
changes
>> mean a lot
>> of maintenance work for the users. A deprecation cycle
does
>> not
>> circumvent that.
>>
>>
>> IMO those advantages counterbalance *by far* the
very
>> little
>> convenience you get from having 'match(query,
subject)'
>> do
>> 'findOverlaps(query, subject, select="first")' on
>> IRanges/GRanges objects. If you need to do that,
just
>> use the
>> latter, or, if you think that's still too much
typing,
>> define
>> a wrapper e.g. 'ovmatch(query, subject)'.
>>
>> There are plenty of specialized tools around for
doing
>> inexact/fuzzy/partial/overlap matching for many
>> particular types
>> of vector-like objects: grep() and family,
pmatch(),
>> charmatch(),
>> agrep(), grepRaw(), matchPattern() and family,
>> findOverlaps() and
>> family, findIntervals(), etc... For the reasons I
>> mentioned
>> above, none of them should hijack match() to make
it do
>> some
>> particular type of inexact matching on some
particular
>> type of
>> objects. Even if, for that particular type of
objects,
>> doing that
>> particular type of inexact matching is more common
than
>> doing
>> exact matching.
>>
>> H.
>>
>>
>>
>> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>>
>> I think having overlapsAny is a nice addition
and
>> helps make
>> the API
>> more complete and explicit. Are you sure we
need to
>> change
>> the behavior
>> of the match method for this relatively
uncommon
>> use case?
>>
>>
>> Yes because otherwise users with a use case of
doing
>> match()
>>
>> even if it's uncommon,
>>
>>
>> I don't think
>> "match" always has to mean "equality". It is a
more
>> general
>> concept in
>> my mind. The most common use case for matching
>> ranges is
>> overlap.
>>
>>
>> Of course "match" doesn't always have to mean
equality.
>> But of base
>>
>>
>> Michael
>>
>>
>> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
>> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>****
>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>>
wrote:****
>>
>> Yes 'peaks %in% genes' is cute and was
>> probably doing
>> the right thing
>> for most users (although not all). But
'exons
>> %in%
>> genes' is cute too
>> and was probably doing the wrong thing
for
>> all users.
>> Advanced users
>> like you guys would have no problem
switching to
>>
>> !is.na <http: is.na="">
<http: is.na="">****
>>
>> <http: is.na="">(findOverlaps(____peaks,
genes,****
>>
>>
>> type="within",
>>
>> select="any"))
>>
>> or
>>
>> !is.na <http: is.na="">
<http: is.na="">****
>>
>> <http: is.na="">(findOverlaps(____peaks,
genes,****
>>
>>
>> type="equal",
>>
>>
>> select="any"))
>>
>> in case 'peaks %in% genes' was not doing
>> exactly what
>> you wanted,
>> but most users would not find this
particularly
>> friendly. Even
>> worse, some users probably didn't realize
that
>> 'peaks
>> %in% genes'
>> was not doing exactly what they thought
it did
>> because
>> "peaks in
>> genes" in English suggests that the peaks
are
>> within
>> the genes,
>> but it's not what 'peaks %in% genes'
does.
>>
>> Having overlapsAny(), with exactly the
same
>> extra
>> arguments as
>> countOverlaps() and subsetByOverlaps()
(i.e.
>> 'maxgap',
>> 'minoverlap',
>> 'type', 'ignore.strand'), all of them
>> documented (and
>> with most
>> users more or less familiar with them
already)
>> has the
>> virtue to
>> expose the user to all the options from
the
>> very start,
>> and to
>> help him/her make the right choice. Of
course
>> there
>> will be users
>> that don't want or don't have the time to
>> read/think
>> about all the
>> options. Not a big deal: they'll just do
>> 'overlapsAny(query, subject)',
>> which is not a lot more typing than
'query %in%
>> subject', especially
>> if they use tab completion.
>>
>> It's true that it's more common to ask
>> questions about
>> overlap than
>> about equality but there are some use
cases
>> for the
>> latter (as the
>> original thread shows). Until now, when
you
>> had such a
>> use case, you
>> could not use match() or %in%, which
would
>> have been
>> the natural things
>> to use, because they got hijacked to do
>> something else,
>> and you were
>> left with nothing. Not a satisfying
situation.
>> So at a
>> minimum, we
>> needed to restore the true/real/original
>> semantic of
>> match() to do
>> "equality" instead of "overlap". But it's
hard
>> to do
>> this for match()
>> and not do it for %in% too. For more than
99%
>> of R
>> users, %in% is
>> just a simple wrapper for 'match(x,
table,
>> nomatch = 0)
>> > 0' (this
>> is how it has been documented and
implemented
>> in base R
>> for many
>> years). Not maintaining this relationship
>> between %in%
>> and match()
>> would only cause grief and frustration to
>> newcomers to
>> Bioconductor.
>>
>> H.
>>
>>
>>
>> On 01/04/2013 03:32 PM, Cook, Malcolm
wrote:
>>
>> Hiya again,
>>
>> I am definitely a late comer to BioC,
so I
>> definitely easily
>> defer to
>> the tide of history.
>>
>> But I do think you miss my point
Michael
>> about the
>> proposed change
>> making the relationship between %in%
and
>> match for
>> {G,I}Ranges{List}
>> mimic that between other vectors, and
I do
>> think
>> that changing
>> the API
>> would make other late-comers take to
BioC
>> easier/faster.
>>
>> That said, I NEVER use %in% so I
really
>> have no
>> stake in the
>> matter, and
>> I DEFINITELY appreciate the argument
to not
>> changing the API
>> just for
>> sematic sweetness.
>>
>> That that said, Herve is _/so good/_
about
>> deprecations and warnings
>>
>> that make such changes fairly easily
>> digestible.
>>
>> That that that.... enough.... I bow
out of
>> this
>> one....!!!!
>>
>> Always learning and Happy New Year to
all
>> lurkers,
>>
>> ~Malcolm
>>
>> *From:*Michael Lawrence****
>>
>> [mailto:lawrence.michael@gene
>> <mailto:lawrence.michael@gene>.
>> <mailto:lawrence.michael@gene>> <mailto:lawrence.michael@gene>.__>____com
>>
>> ****
>>
>> <mailto:lawrence.michael@gene.>> <mailto:lawrence.michael@gene.>____com
>> <mailto:lawrence.michael@gene.__com>> <mailto:lawrence.michael@gene.com>>>]
>> *Sent:* Friday, January 04, 2013 5:11
PM
>> *To:* Cook, Malcolm
>> *Cc:* Sean Davis; Michael Lawrence;
Hervé
>> Pagès
>> (hpages@fhcrc.org
>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>>****
>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">**
>> **
>>
>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>); Tim
>>
>>
>>
>> Triche, Jr.; Vedran Franke;
>> bioconductor@r-project.org
<mailto:bioconductor@r-project.org>
>> <mailto:bioconductor@r-__project.org>> <mailto:bioconductor@r-project.org>>****
>>
>>
<mailto:bioconductor@r-____project.org>> <mailto:bioconductor@r-__project.org>
>>
>> <mailto:bioconductor@r-__project.org>> <mailto:bioconductor@r-project.org>>>****
>>
>> *Subject:* Re: [BioC] countMatches()
(was:
>> table
>> for GenomicRanges)
>>
>>
>> On Fri, Jan 4, 2013 at 1:56 PM, Cook,
>> Malcolm
>> <mec@stowers.org <mailto:mec@stowers.org="">
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>
>> <mailto:mec@stowers.org>> <mailto:mec@stowers.org> <mailto:mec@stowers.org>> <mailto:mec@stowers.org>>>
>> <mailto:mec@stowers.org>> <mailto:mec@stowers.org> <mailto:mec@stowers.org>> <mailto:mec@stowers.org>>
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>>>> wrote:
>>
>> Hiya,
>>
>> For what it is worth...
>>
>> I think the change to %in% is
warranted.
>>
>> If I understand correctly, this
change
>> restores the
>> relationship
>> between
>> the semantics of `%in` and the
semantics
>> of `match`.
>>
>> From the docs:
>>
>> '"%in%" <- function(x, table)
match(x,
>> table,
>> nomatch = 0) > 0'
>>
>> Herve's change restores this
relationship.
>>
>>
>> match and %in% were initially
consistent
>> (both
>> considering any
>> overlap);
>> Herve has changed both of them
together.
>> The whole
>> idea behind
>> IRanges
>> is that ranges are special data types
with
>> special
>> semantics. We
>> have
>> reimplemented much of the existing R
>> vector API
>> using those
>> semantics;
>> this extends beyond match/%in%. I am
>> hesitant about
>> making such
>> sweeping
>> changes to the API so late in the
>> life-cycle of the
>> package.
>> There was a
>> feature request for a way to count
>> identical ranges
>> in a set of
>> ranges.
>> Let's please not get carried away and
start
>> redesigning the API
>> for this
>> one, albeit useful, request. There
are all
>> sorts of
>> inconsistencies in
>> the API, and many of them were
conscious
>> decisions
>> that considered
>> practical use cases.
>>
>> Michael
>>
>>
>> Herve, I suspect you were you as
a
>> result able to
>> completely drop
>> all the
`%in%,BiocClass1,BiocClass2`
>> definitions and depend
>> upon
>> base::%in%
>>
>> Am I right?
>>
>> If so, may I suggest that Herve
stay
>> the
>> course, with the
>> addition of
>> '"%ol%" <- function(a, b)
>> findOverlaps(a,
>> b, maxgap=0L,
>> minoverlap=1L, type='any',
>> select='all') > 0'
>>
>> This would provide a
perspicacious
>> idiom, thereby
>> optimizing the API
>> for Michaels observed common use
case.
>>
>> Just sayin'
>>
>> ~Malcolm
>>
>>
>> .-----Original Message-----
>> .From:****
>>
>> bioconductor-bounces@r-______project.org
>> <mailto:bioconductor-bounces@r-____project.org>
>> <mailto:bioconductor-bounces@__r-__project.org>> <mailto:bioconductor-bounces@r-__project.org>>****
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>____r-project.org
>> <http: r-project.org="">
>> <mailto:bioconductor-bounces@__r-project.org>> <mailto:bioconductor-bounces@r-project.org>>>
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>****
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>>______r-project.org
>> <http: r-project.org="">
>> <http: r-project.org="">****
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>____r-project.org
>> <http: r-project.org="">
>> <mailto:bioconductor-bounces@__r-project.org>> <mailto:bioconductor-bounces@r-project.org>>>>
>> [mailto:bioconductor-bounces@
>> <mailto:bioconductor-bounces@>****
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>>______r-project.org
>> <http: r-project.org="">
>> <http: r-project.org="">****
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>____r-project.org
>> <http: r-project.org="">
>> <mailto:bioconductor-bounces@__r-project.org>> <mailto:bioconductor-bounces@r-project.org>>>
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>****
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>>______r-project.org
>> <http: r-project.org="">
>> <http: r-project.org="">****
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>____r-project.org
>> <http: r-project.org="">
>> <mailto:bioconductor-bounces@__r-project.org>> <mailto:bioconductor-bounces@r-project.org>>>>] On Behalf
Of Sean
>> Davis
>> .Sent: Friday, January 04,
2013 3:37
>> PM
>> .To: Michael Lawrence
>> .Cc: Tim Triche, Jr.; Vedran
Franke;
>> bioconductor@r-project.org
<mailto:bioconductor@r-project.org>
>> <mailto:bioconductor@r-__project.org>> <mailto:bioconductor@r-project.org>>
>> <mailto:bioconductor@r-____project.org>> <mailto:bioconductor@r-__project.org>
>> <mailto:bioconductor@r-__project.org>> <mailto:bioconductor@r-project.org>>>****
>>
>> <mailto:bioconductor@r-______project.org>> <mailto:bioconductor@r-____project.org>****
>>
>>
>> <mailto:bioconductor@r-____project.org>> <mailto:bioconductor@r-__project.org>>
>>
>>
>>
<mailto:bioconductor@r-____project.org>> <mailto:bioconductor@r-__project.org>
>> <mailto:bioconductor@r-__project.org>> <mailto:bioconductor@r-project.org>>>>
>>
>> .Subject: Re: [BioC]
countMatches()
>> (was:
>> table for
>> GenomicRanges)
>> .
>> .On Fri, Jan 4, 2013 at 4:32
PM,
>> Michael
>> Lawrence
>> .<lawrence.michael@gene.com>> <mailto:lawrence.michael@gene.com>
>> <mailto:lawrence.michael@gene.__com>> <mailto:lawrence.michael@gene.com>>
>> <mailto:lawrence.michael@gene.>> <mailto:lawrence.michael@gene.>____com
>> <mailto:lawrence.michael@gene.__com>> <mailto:lawrence.michael@gene.com>>>****
>>
>> <mailto:lawrence.michael@gene>> <mailto:lawrence.michael@gene>.
>> <mailto:lawrence.michael@gene>> <mailto:lawrence.michael@gene>.__>____com****
>>
>> <mailto:lawrence.michael@gene.>> <mailto:lawrence.michael@gene.>____com
>> <mailto:lawrence.michael@gene.__com>> <mailto:lawrence.michael@gene.com>>>>> wrote:
>> .> The change to the behavior
of
>> %in% is a
>> pretty big
>> one. Are you
>> thinking
>> .> that all set-based
operations
>> should
>> behave this way? For
>> example, setdiff
>> .> and intersect? I really
liked
>> the syntax
>> of "peaks
>> %in% genes".
>> In my
>> .> experience, it's way more
common
>> to ask
>> questions
>> about overlap
>> than about
>> .> equality, so I'd rather
optimize
>> the API
>> for that use
>> case. But
>> again,
>> .> that's just my personal
bias.
>> .
>> .For what it is worth, I share
>> Michael's
>> personal bias here.
>> .
>> .Sean
>> .
>> .
>> .> Michael
>> .>
>> .>
>> .> On Fri, Jan 4, 2013 at 1:11
PM,
>> Hervé Pagès
>> <hpages@fhcrc.org>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
>> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>>>>> wrote:
>> .>
>> .>> Hi,
>> .>>
>> .>> I added findMatches() and
>> countMatches()
>> to the
>> latest IRanges /
>> .>> GenomicRanges packages (in
BioC
>> devel only).
>> .>>
>> .>> findMatches(x, table):
An
>> enhanced
>> version of
>> match that
>> .>> returns all the
>> matches in a
>> Hits object.
>> .>>
>> .>> countMatches(x, table):
>> Returns an
>> integer vector
>> of the length
>> .>> of x,
containing
>> the number
>> of matches in
>> table for
>> .>> each element in
x.
>> .>>
>>
>> .>> countMatches() is what you
can
>> use to
>> tally/count/tabulate
>> (choose your
>>
>> .>> preferred term) the unique
>> elements in a
>> GRanges object:
>> .>>
>> .>> library(GenomicRanges)
>> .>> set.seed(33)
>> .>> gr <-
GRanges("chr1",****
>>
>>
IRanges(sample(15,20,replace=*______*TRUE),
>> ****
>>
>>
>>
>>
>> width=5))
>> .>>
>> .>> Then:
>> .>>
>> .>> > gr_levels <-
sort(unique(gr))
>> .>> >
countMatches(gr_levels, gr)
>> .>> [1] 1 1 1 2 4 2 2 1 2 2
2
>> .>>
>> .>> Note that findMatches()
and
>> countMatches() also work on
>> IRanges and
>> .>> DNAStringSet objects, as
well as
>> on
>> ordinary atomic
>> vectors:
>> .>>
>> .>> library(hgu95av2probe)
>> .>> library(Biostrings)
>> .>> probes <-
>> DNAStringSet(hgu95av2probe)
>> .>> unique_probes <-
unique(probes)
>> .>> count <-
>> countMatches(unique_probes,
>> probes)
>> .>> max(count) # 7
>> .>>
>> .>> I made other changes in
>> IRanges/GenomicRanges so that
>> the notion
>> .>> of "match" between
elements of a
>> vector-like object now
>> consistently
>> .>> means "equality" instead
of
>> "overlap",
>> even for
>> range-based
>> objects
>> .>> like IRanges or GRanges
>> objects. This
>> notion of
>> "equality" is the
>> .>> same that is used by ==.
The most
>> visible consequence
>> of those
>> .>> changes is that using %in%
>> between 2
>> IRanges or
>> GRanges objects
>> .>> 'query' and 'subject' in
order
>> to do
>> overlaps was
>> replaced by
>> .>> overlapsAny(query,
subject).
>> .>>
>> .>> overlapsAny(query,
subject):
>> Finds the
>> ranges in
>> query that
>> .>> overlap any of the
ranges
>> in subject.
>> .>>
>>
>> .>> There are warnings and
>> deprecation
>> messages in place
>> to help
>> smooth
>>
>> .>> the transition.
>> .>>
>> .>> Cheers,
>> .>> H.
>> .>>
>> .>> --
>> .>> Hervé Pagès
>> .>>
>> .>> Program in Computational
Biology
>> .>> Division of Public Health
>> Sciences
>> .>> Fred Hutchinson Cancer
Research
>> Center
>> .>> 1100 Fairview Ave. N,
M1-B514
>> .>> P.O. Box 19024
>> .>> Seattle, WA 98109-1024
>> .>>
>> .>> E-mail: hpages@fhcrc.org
>> <mailto:hpages@fhcrc.org>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
>> ****
>>
>> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>>
>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">**
>> **
>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>>
>>
>> .>> Phone: (206) 667-5791
>> <tel:%28206%29%20667-5791 <%28206%29%20667-5791="">>
>> <tel:%28206%29%20667-5791 <%28206%29%20667-5791="">> <
>> tel:%28206%29%20667-5791 <%28206%29%20667-5791>>
>>
<tel:%28206%29%20667-5791<%28206%29%20667-5791>
>> >
>> .>> Fax: (206) 667-1319
>> <tel:%28206%29%20667-1319 <%28206%29%20667-1319="">>
>> <tel:%28206%29%20667-1319 <%28206%29%20667-1319="">> <
>> tel:%28206%29%20667-1319 <%28206%29%20667-1319>>
>>
<tel:%28206%29%20667-1319<%28206%29%20667-1319>
>> >
>>
>> .>>
>> .>
>> .> [[alternative HTML
>> version deleted]]
>> .>
>> .>
>> .>****
>>
>>
_____________________________________________________
>> ****
>>
>>
>>
>>
>> .> Bioconductor mailing list
>> .> Bioconductor@r-project.org
>> <mailto:bioconductor@r-project.org>
>> <mailto:bioconductor@r-__project.org>> <mailto:bioconductor@r-project.org>>
>>
<mailto:bioconductor@r-____project.org>> <mailto:bioconductor@r-__project.org>
>> <mailto:bioconductor@r-__project.org>> <mailto:bioconductor@r-project.org>>>****
>>
>>
<mailto:bioconductor@r-______project.org>> <mailto:bioconductor@r-____project.org>****
>>
>>
>> <mailto:bioconductor@r-____project.org>> <mailto:bioconductor@r-__project.org>>
>>
>>
<mailto:bioconductor@r-____project.org>> <mailto:bioconductor@r-__project.org>
>> <mailto:bioconductor@r-__project.org>> <mailto:bioconductor@r-project.org>>>>
>>
>> .>****
>>
>> https://stat.ethz.ch/mailman/______listinfo/bioconductor
>>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">****
>>
>>
>>
>> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
>>
>>
>>
>>
>> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>>
>> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
>> .> Search the archives:****
>>
>>
>> <http: news.gmane.org="" gmane.______science.biology.informatics.____="" __conductor="">
>>
>> ...
>>
>> [Message clipped]
>
>
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
Thanks all for the feedback. Will do %over% and %within%. Hopefully we
can consider this is the end of the thread :-b I'll just post a quick
note on Bioc-devel when this is ready.
Cheers,
H.
On 01/08/2013 03:07 PM, Michael Lawrence wrote:
> I think %over% and maybe %within% are all that's needed. Could go to
> %start% and %end%.
>
> Michael
>
>
>
>
>
> On Tue, Jan 8, 2013 at 2:59 PM, Cook, Malcolm <mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">> wrote:
>
> If we?re voting/brainstorming, I?d go for one operator for value
> that the ?type? arg of overlap can take on____
>
> __ __
>
> Thus:____
>
> __ __
>
> %olStart%____
>
> %olEnd%____
>
> %olWithin%____
>
> %olAny% (perhaps with alias of just ?%ol%?)____
>
> %olEqual% (which should be same as %in%, right)____
>
> __ __
>
> Doh, I can?t stay away from this issue for some reason.....
Anyway,
> my 2 cents____
>
> __ __
>
> ~Malcolm____
>
> __ __
>
> *From:*Tim Triche, Jr. [mailto:tim.triche at gmail.com
> <mailto:tim.triche at="" gmail.com="">]
> *Sent:* Tuesday, January 08, 2013 4:12 PM
> *To:* Michael Lawrence
> *Cc:* Hervé Pagès; Cook, Malcolm; Sean Davis; Vedran Franke;
> bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> *Subject:* Re: [BioC] countMatches() (was: table for
GenomicRanges)____
>
> __ __
>
> Michael: your suggestion is both clearer and more concise than
mine
> was. +1 ____
>
> __ __
>
> (I prefer x %i% y %i% z rather than intersect(x, intersect(y,
z))
> for the same reason)____
>
> __ __
>
> __ __
>
> __ __
>
> __ __
>
> On Tue, Jan 8, 2013 at 2:03 PM, Michael Lawrence
> <lawrence.michael at="" gene.com="" <mailto:lawrence.michael="" at="" gene.com="">>
> wrote:____
>
> I would vote for %over% instead of %ov%. Just 2 more characters
but
> way clearer, at least to me. The hardest thing to type are the
%'s.
>
> Michael____
>
> __ __
>
> On Tue, Jan 8, 2013 at 11:09 AM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote:____
>
> Thanks Tim, Malcolm for the feedback.
>
> @Tim, I won't comment on the variants of %ov% you are
proposing for
> doing "within" or "equal" instead of "any" (but if people
want them,
> I'll add them too). For now I just want to focus on
restoring the
> convenience of the old %in%, whose removal is understandably
causing
> some frustration. And so we can move on.
>
> Cheers,
> H.____
>
>
>
>
> On 01/08/2013 09:50 AM, Tim Triche, Jr. wrote:____
>
> hell, I'll add the operators if there's support for
them.
> obviously
> they're not a big deal and a patch would take 5 minutes
flat.
>
> my hope was to be very explicit about what each type of
> operation meant,
> so that when a newcomer to the Ranges API sees
>
> peaks %overlapping%
promoters(someGroupOfGenesWeCareAbout)
>
> it cannot be confused with
>
> peaks %within%
rangesThatCorrespondToSomeChromatinState
>
> or
>
> peaks %equal% aBunchOfDNAseFootprints
>
> or
>
> DMRs %in% genes ## what the hell does this really
mean,
> anyways?
> it's so bad on so many levels
>
> because whenever someone says "what is the advantage of
> Ranges-based
> analyses?", these are the archetypal sorts of queries
that
> come to mind.
> Except that usually in my examples they are based on
> posterior
> probabilities, but perhaps that could stand to change.
>
> Anyways, that's just my bias, and you're doing the heavy
> lifting. But
> if people agree with the motivations I will write the
patch
> today.
>
> Cheers,
>
> --t
>
>
>
>
> On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès
> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">____
>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>> wrote:
>
> Hi Tim,
>
> I could add the %ov% operator as a replacement for
the
> old %in%. So you
> would write 'peaks %ov% genes' instead of 'peaks
%in%
> genes'. Would just
> be a convenience wrapper for 'overlapsAny(peaks,
genes)'.
>
> Cheers,
> H.
>
>
> On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
>
> So why not leave %in% as it was and transition
> everything forward to
> explicitly using { `%within%`,
> `%overlaps%`|`%overlapping%`,
> `%equals%`
> } such that
>
> identical( x %within% table,
countOverlaps(x,
> table,
> type='within') >
> 0 ) == TRUE
> identical( x %overlaps% table,
countOverlaps(x,
> table,
> type='any') >
> 0 ) == TRUE
> identical( x %equals% table,
countOverlaps(x,
> table,
> type='equal') >
> 0 ) == TRUE
>
> and for the time being,
>
> identical( x %overlaps% table,
countOverlaps(x,
> table,
> type='any') >
> 0 ) == TRUE ## but with a noisy nastygram that
will
> halt if
> options("warn"=2)
> No breakage for %in% methods until such time as
a full
> deprecation cycle
> has passed, and if the maintainers can't be
arsed
> to do anything
> at all
> about the warnings by the second full release,
then
> perhaps they
> don't
> really care that much after all. Just a
thought?
>
> From someone (me) who has their own issues
with
> keeping
> everything up
> to date and should know better. If you want to
use
> %in% for
>
> peaks %in% genes (why on earth would you do
> this rather than
> peaks
> %in% promoters(genes), anyways?)
>
> then a nastygram could be emitted "WARNING:
YOUR
> SHORTHAND
> NOTATION IS
> DOOMED AFTER BIOC 2.13, YOU WILL BE
ASSIMILATED"
> and everyone is
> (more
> or less) happy.
>
>
>
> On Mon, Jan 7, 2013 at 11:33 AM, Michael
Lawrence
> <lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>____
>
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">__com
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>>> wrote:
>
>
>
> ____
>
> On Mon, Jan 7, 2013 at 11:00 AM, Hervé
Pagès
> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>____
>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>>> wrote:
>
> Hi Michael,
>
> I don't think "match" (the word)
always
> has to mean
> "equality"
> either.
> However having match() (the function)
do
> "whole exact
> matching" (aka
> "equality") for any kind of vector-
like
> object has the
> advantage of:
>
> (a) making it consistent with
base::match()
> (?base::match is
> pretty
> explicit about what the
contract of
> match() is)
>
>
> (a) alone is obviously not enough. We
have
> many methods,
> like the
> set operations, that treat ranges
specially.
> Are we going
> to start
> moving everything toward the base
behavior?
> And have
> rangeIntersect,
> rangeSetdiff, etc?
>
> (b) preserving its relationship
with ==,
> duplicated(), unique(),
> etc...
>
>
> So it becomes consistent with
> duplicated/unique, but we lose
> consistency with the set operations.
>
> (c) not frustrating the user who
needs
> something to
> do exact
> matching on ranges (as I
mentioned
> previously,
> if you take
> match() away from him/her,
s/he'll
> be left with
> nothing).
>
>
> No one has ever asked for match() to
behave
> this way. There
> was a
> request for a way to tabulate identical
> ranges. It was a
> nice idea
> to extract the general "outer equal"
> findMatches function.
> But the
> changes seem to be snow-balling. These
types
> of changes
> mean a lot
> of maintenance work for the users. A
> deprecation cycle does not
> circumvent that.
>
>
> IMO those advantages counterbalance
*by
> far* the very
> little
> convenience you get from having
> 'match(query, subject)' do
> 'findOverlaps(query, subject,
> select="first")' on
> IRanges/GRanges objects. If you need
to do
> that, just
> use the
> latter, or, if you think that's still
too
> much typing,
> define
> a wrapper e.g. 'ovmatch(query,
subject)'.
>
> There are plenty of specialized tools
> around for doing
> inexact/fuzzy/partial/overlap matching
for
> many
> particular types
> of vector-like objects: grep() and
family,
> pmatch(),
> charmatch(),
> agrep(), grepRaw(), matchPattern() and
family,
> findOverlaps() and
> family, findIntervals(), etc... For
the
> reasons I mentioned
> above, none of them should hijack
match()
> to make it do
> some
> particular type of inexact matching on
> some particular
> type of
> objects. Even if, for that particular
type
> of objects,
> doing that
> particular type of inexact matching is
> more common than
> doing
> exact matching.
>
> H.
>
>
>
> On 01/06/2013 05:39 PM, Michael
Lawrence
> wrote:
>
> I think having overlapsAny is a
nice
> addition and
> helps make
> the API
> more complete and explicit. Are
you
> sure we need to
> change
> the behavior
> of the match method for this
> relatively uncommon
> use case?
>
>
> Yes because otherwise users with a use
> case of doing
> match()
>
> even if it's uncommon,
>
>
> I don't think
> "match" always has to mean
"equality".
> It is a more
> general
> concept in
> my mind. The most common use case
for
> matching
> ranges is
> overlap.
>
>
> Of course "match" doesn't always have
to
> mean equality.
> But of base
>
>
> Michael
>
>
> On Fri, Jan 4, 2013 at 8:34 PM,
Herv?
> Pag?s
> <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>____
>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>>
> wrote:____
>
> Yes 'peaks %in% genes' is
cute
> and was
> probably doing
> the right thing
> for most users (although not
> all). But 'exons %in%
> genes' is cute too
> and was probably doing the
wrong
> thing for
> all users.
> Advanced users
> like you guys would have no
> problem switching to
>
> !is.na <http: is.na="">
> <http: is.na=""> <http: is.na="">____
>
>
<http: is.na="">(findOverlaps(____peaks,
> genes,____
>
>
> type="within",
>
> select="any"))
>
> or
>
> !is.na <http: is.na="">
> <http: is.na=""> <http: is.na="">____
>
>
<http: is.na="">(findOverlaps(____peaks,
> genes,____
>
>
> type="equal",
>
>
> select="any"))
>
> in case 'peaks %in% genes'
was
> not doing
> exactly what
> you wanted,
> but most users would not find
> this particularly
> friendly. Even
> worse, some users probably
didn't
> realize that
> 'peaks
> %in% genes'
> was not doing exactly what
they
> thought it did
> because
> "peaks in
> genes" in English suggests
that
> the peaks are
> within
> the genes,
> but it's not what 'peaks %in%
> genes' does.
>
> Having overlapsAny(), with
> exactly the same extra
> arguments as
> countOverlaps() and
> subsetByOverlaps() (i.e.
> 'maxgap',
> 'minoverlap',
> 'type', 'ignore.strand'), all
of them
> documented (and
> with most
> users more or less familiar
with
> them already)
> has the
> virtue to
> expose the user to all the
> options from the
> very start,
> and to
> help him/her make the right
> choice. Of course
> there
> will be users
> that don't want or don't have
the
> time to
> read/think
> about all the
> options. Not a big deal:
they'll
> just do
> 'overlapsAny(query, subject)',
> which is not a lot more
typing
> than 'query %in%
> subject', especially
> if they use tab completion.
>
> It's true that it's more
common
> to ask
> questions about
> overlap than
> about equality but there are
some
> use cases
> for the
> latter (as the
> original thread shows). Until
> now, when you
> had such a
> use case, you
> could not use match() or
%in%,
> which would
> have been
> the natural things
> to use, because they got
hijacked
> to do
> something else,
> and you were
> left with nothing. Not a
> satisfying situation.
> So at a
> minimum, we
> needed to restore the
> true/real/original
> semantic of
> match() to do
> "equality" instead of
"overlap".
> But it's hard
> to do
> this for match()
> and not do it for %in% too.
For
> more than 99% of R
> users, %in% is
> just a simple wrapper for
> 'match(x, table,
> nomatch = 0)
> > 0' (this
> is how it has been documented
and
> implemented
> in base R
> for many
> years). Not maintaining this
> relationship
> between %in%
> and match()
> would only cause grief and
> frustration to
> newcomers to
> Bioconductor.
>
> H.
>
>
>
> On 01/04/2013 03:32 PM, Cook,
> Malcolm wrote:
>
> Hiya again,
>
> I am definitely a late
comer
> to BioC, so I
> definitely easily
> defer to
> the tide of history.
>
> But I do think you miss
my
> point Michael
> about the
> proposed change
> making the relationship
> between %in% and
> match for
> {G,I}Ranges{List}
> mimic that between other
> vectors, and I do
> think
> that changing
> the API
> would make other late-
comers
> take to BioC
> easier/faster.
>
> That said, I NEVER use
%in%
> so I really
> have no
> stake in the
> matter, and
> I DEFINITELY appreciate
the
> argument to not
> changing the API
> just for
> sematic sweetness.
>
> That that said, Herve is
_/so
> good/_ about
> deprecations and warnings
>
> that make such changes
fairly
> easily
> digestible.
>
> That that that....
enough....
> I bow out of
> this
> one....!!!!
>
> Always learning and Happy
New
> Year to all
> lurkers,
>
> ~Malcolm
>
> *From:*Michael
Lawrence____
>
> [mailto:lawrence.michael at gene
> <mailto:lawrence.michael at="" gene="">
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">>.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">>.__>____com
>
> ____
>
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.>____com
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">__com
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>>>]
> *Sent:* Friday, January
04,
> 2013 5:11 PM
> *To:* Cook, Malcolm
> *Cc:* Sean Davis; Michael
> Lawrence; Herv?
> Pag?s
> (hpages at fhcrc.org
> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>>____
>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>____
>
>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>); Tim
>
>
>
> Triche, Jr.; Vedran
Franke;
> bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>____
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>
>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>>____
>
> *Subject:* Re: [BioC]
> countMatches() (was:
> table
> for GenomicRanges)
>
>
> On Fri, Jan 4, 2013 at
1:56
> PM, Cook, Malcolm
> <mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">>
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>
> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>>
> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>
> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">>
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>>>> wrote:
>
> Hiya,
>
> For what it is worth...
>
> I think the change to
%in% is
> warranted.
>
> If I understand
correctly,
> this change
> restores the
> relationship
> between
> the semantics of `%in`
and
> the semantics
> of `match`.
>
> From the docs:
>
> '"%in%" <-
function(x,
> table) match(x,
> table,
> nomatch = 0) > 0'
>
> Herve's change restores
this
> relationship.
>
>
> match and %in% were
initially
> consistent (both
> considering any
> overlap);
> Herve has changed both of
> them together.
> The whole
> idea behind
> IRanges
> is that ranges are
special
> data types with
> special
> semantics. We
> have
> reimplemented much of the
> existing R
> vector API
> using those
> semantics;
> this extends beyond
> match/%in%. I am
> hesitant about
> making such
> sweeping
> changes to the API so
late in the
> life-cycle of the
> package.
> There was a
> feature request for a way
to
> count
> identical ranges
> in a set of
> ranges.
> Let's please not get
carried
> away and start
> redesigning the API
> for this
> one, albeit useful,
request.
> There are all
> sorts of
> inconsistencies in
> the API, and many of them
> were conscious
> decisions
> that considered
> practical use cases.
>
> Michael
>
>
> Herve, I suspect you
> were you as a
> result able to
> completely drop
> all the
> `%in%,BiocClass1,BiocClass2`
> definitions and depend
> upon
> base::%in%
>
> Am I right?
>
> If so, may I suggest
> that Herve stay the
> course, with the
> addition of
> '"%ol%" <-
function(a, b)
> findOverlaps(a,
> b, maxgap=0L,
> minoverlap=1L,
type='any',
> select='all') > 0'
>
> This would provide a
> perspicacious
> idiom, thereby
> optimizing the API
> for Michaels
observed
> common use case.
>
> Just sayin'
>
> ~Malcolm
>
>
> .-----Original
> Message-----
> .From:____
>
> bioconductor-bounces at r-______project.org
> <mailto:bioconductor-bounces at="" r-______project.org="">
> <mailto:bioconductor-bounces at="" r-____project.org=""> <mailto:bioconductor-bounces at="" r-____project.org="">>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>__r-__project.org
> <http: r-__project.org="">
> <mailto:bioconductor-bounces at="" r-__project.org=""> <mailto:bioconductor-bounces at="" r-__project.org="">>>____
>
> <mailto:bioconductor- bounces@=""> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>____r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>__r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>>
>
>
> <mailto:bioconductor-bounces@ <mailto:bioconductor-="" bounces@="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>____
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <http: r-project.org="">____
>
> <mailto:bioconductor- bounces@=""> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>____r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>__r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>>>
>
> [mailto:bioconductor-bounces@ <mailto:bioconductor- bounces@="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>____
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <http: r-project.org="">____
>
> <mailto:bioconductor- bounces@=""> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>____r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>__r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>>
>
>
> <mailto:bioconductor-bounces@ <mailto:bioconductor-="" bounces@="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>____
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <http: r-project.org="">____
>
> <mailto:bioconductor- bounces@=""> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>____r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>__r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>>>] On
Behalf
> Of Sean
> Davis
> .Sent: Friday,
January
> 04, 2013 3:37 PM
> .To: Michael
Lawrence
> .Cc: Tim Triche,
Jr.;
> Vedran Franke;
> bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>>____
>
> <mailto:bioconductor at="" r-______project.org=""> <mailto:bioconductor at="" r-______project.org="">
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">>____
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>>
>
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>>>
>
> .Subject: Re:
[BioC]
> countMatches()
> (was:
> table for
> GenomicRanges)
> .
> .On Fri, Jan 4,
2013
> at 4:32 PM,
> Michael
> Lawrence
>
> .<lawrence.michael at="" gene.com="" <mailto:lawrence.michael="" at="" gene.com="">
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">__com
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>>
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.>____com
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">__com
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>>>____
>
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">>.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">>.__>____com____
>
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.>____com
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">__com
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>>>>> wrote:
> .> The change to
the
> behavior of
> %in% is a
> pretty big
> one. Are you
> thinking
> .> that all set-
based
> operations should
> behave this way? For
> example, setdiff
> .> and intersect?
I
> really liked
> the syntax
> of "peaks
> %in% genes".
> In my
> .> experience,
it's
> way more common
> to ask
> questions
> about overlap
> than about
> .> equality, so
I'd
> rather optimize
> the API
> for that use
> case. But
> again,
> .> that's just my
> personal bias.
> .
> .For what it is
worth,
> I share
> Michael's
> personal bias here.
> .
> .Sean
> .
> .
> .> Michael
> .>
> .>
> .> On Fri, Jan 4,
2013
> at 1:11 PM,
> Hervé Pagès
> <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>>>>> wrote:
> .>
> .>> Hi,
> .>>
> .>> I added
> findMatches() and
> countMatches()
> to the
> latest IRanges /
> .>> GenomicRanges
> packages (in BioC
> devel only).
> .>>
> .>>
findMatches(x,
> table): An
> enhanced
> version of
> ?match? that
> .>>
returns
> all the
> matches in a
> Hits object.
> .>>
> .>>
countMatches(x,
> table):
> Returns an
> integer vector
> of the length
> .>> of
?x?,
> containing
> the number
> of matches in
> ?table? for
> .>> each
> element in ?x?.
> .>>
>
> .>> countMatches()
is
> what you can
> use to
> tally/count/tabulate
> (choose your
>
> .>> preferred
term)
> the unique
> elements in a
> GRanges object:
> .>>
> .>>
> library(GenomicRanges)
> .>> set.seed(33)
> .>> gr <-
> GRanges("chr1",____
>
>
> IRanges(sample(15,20,replace=*______*TRUE),____
>
>
>
>
> width=5))
> .>>
> .>> Then:
> .>>
> .>> > gr_levels
<-
> sort(unique(gr))
> .>> >
> countMatches(gr_levels, gr)
> .>> [1] 1 1 1 2
4 2
> 2 1 2 2 2
> .>>
> .>> Note that
> findMatches() and
> countMatches() also work on
> IRanges and
> .>> DNAStringSet
> objects, as well as on
> ordinary atomic
> vectors:
> .>>
> .>>
> library(hgu95av2probe)
> .>>
library(Biostrings)
> .>> probes <-
> DNAStringSet(hgu95av2probe)
> .>>
unique_probes <-
> unique(probes)
> .>> count <-
> countMatches(unique_probes,
> probes)
> .>> max(count)
# 7
> .>>
> .>> I made other
> changes in
> IRanges/GenomicRanges so that
> the notion
> .>> of "match"
between
> elements of a
> vector-like object now
> consistently
> .>> means
"equality"
> instead of
> "overlap",
> even for
> range-based
> objects
> .>> like IRanges
or
> GRanges
> objects. This
> notion of
> "equality" is the
> .>> same that is
used
> by ==. The most
> visible consequence
> of those
> .>> changes is
that
> using %in%
> between 2
> IRanges or
> GRanges objects
> .>> 'query' and
> 'subject' in order
> to do
> overlaps was
> replaced by
> .>>
overlapsAny(query,
> subject).
> .>>
> .>>
> overlapsAny(query, subject):
> Finds the
> ranges in
> ?query? that
> .>> overlap
any
> of the ranges
> in ?subject?.
> .>>
>
> .>> There are
warnings
> and deprecation
> messages in place
> to help
> smooth
>
> .>> the
transition.
> .>>
> .>> Cheers,
> .>> H.
> .>>
> .>> --
> .>> Hervé Pagès
> .>>
> .>> Program in
> Computational Biology
> .>> Division of
Public
> Health Sciences
> .>> Fred
Hutchinson
> Cancer Research
> Center
> .>> 1100 Fairview
Ave.
> N, M1-B514
> .>> P.O. Box 19024
> .>> Seattle, WA
98109-1024
> .>>
> .>> E-mail:
> hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>>>____
>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>____
>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>>
>
> .>> Phone: (206)
> 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
>
<tel:%28206%29%20667-5791>
> .>> Fax: (206)
> 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
<tel:%28206%29%20667-1319>
>
> .>>
> .>
> .>
> [[alternative HTML
> version deleted]]
> .>
> .>
> .>____
>
>
>
_________________________________________________________
>
>
>
>
> .> Bioconductor
> mailing list
> .>
> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>>____
>
>
> <mailto:bioconductor at="" r-______project.org=""> <mailto:bioconductor at="" r-______project.org="">
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">>____
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>>
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>>>
>
> .>____
>
> https://stat.ethz.ch/mailman/______listinfo/bioconductor
>
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">____
>
>
>
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
>
>
>
>
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
>
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
> .> Search the
> archives:____
>
> <http: news.gmane.org="" gmane.______science.biology.infor="" matics.______conductor="">
>
> ...
>
> [Message clipped]
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
On 01/08/2013 02:59 PM, Cook, Malcolm wrote:
> If we?re voting/brainstorming, I?d go for one operator for value
that
> the ?type? arg of overlap can take on
>
> Thus:
>
> %olStart%
>
> %olEnd%
>
> %olWithin%
>
> %olAny% (perhaps with alias of just ?%ol%?)
>
> %olEqual% (which should be same as %in%, right)
Except for zero-width ranges: they never overlap with anything, but
2 zero-width ranges with the same start are considered equal:
> ir <- IRanges(start=5:7, width=0:2)
> ir
IRanges of length 3
start end width
[1] 5 4 0
[2] 6 6 1
[3] 7 8 2
> overlapsAny(ir, ir, type="equal")
[1] FALSE TRUE TRUE
> suppressWarnings(ir %in% ir)
[1] TRUE TRUE TRUE
Also I believe the new %in% should generally be faster than
overlapsAny( , type="equal"), and also perhaps more memory
efficient, but I didn't do enough testing to quantify this.
H.
>
> Doh, I can?t stay away from this issue for some reason..... Anyway,
my 2
> cents
>
> ~Malcolm
>
> *From:*Tim Triche, Jr. [mailto:tim.triche at gmail.com]
> *Sent:* Tuesday, January 08, 2013 4:12 PM
> *To:* Michael Lawrence
> *Cc:* Hervé Pagès; Cook, Malcolm; Sean Davis; Vedran Franke;
> bioconductor at r-project.org
> *Subject:* Re: [BioC] countMatches() (was: table for GenomicRanges)
>
> Michael: your suggestion is both clearer and more concise than mine
was.
> +1
>
> (I prefer x %i% y %i% z rather than intersect(x, intersect(y, z))
for
> the same reason)
>
> On Tue, Jan 8, 2013 at 2:03 PM, Michael Lawrence
> <lawrence.michael at="" gene.com="" <mailto:lawrence.michael="" at="" gene.com="">>
wrote:
>
> I would vote for %over% instead of %ov%. Just 2 more characters but
way
> clearer, at least to me. The hardest thing to type are the %'s.
>
> Michael
>
> On Tue, Jan 8, 2013 at 11:09 AM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote:
>
> Thanks Tim, Malcolm for the feedback.
>
> @Tim, I won't comment on the variants of %ov% you are proposing
for
> doing "within" or "equal" instead of "any" (but if people want
them,
> I'll add them too). For now I just want to focus on restoring
the
> convenience of the old %in%, whose removal is understandably
causing
> some frustration. And so we can move on.
>
> Cheers,
> H.
>
>
>
>
> On 01/08/2013 09:50 AM, Tim Triche, Jr. wrote:
>
> hell, I'll add the operators if there's support for them.
obviously
> they're not a big deal and a patch would take 5 minutes
flat.
>
> my hope was to be very explicit about what each type of
> operation meant,
> so that when a newcomer to the Ranges API sees
>
> peaks %overlapping%
promoters(someGroupOfGenesWeCareAbout)
>
> it cannot be confused with
>
> peaks %within% rangesThatCorrespondToSomeChromatinState
>
> or
>
> peaks %equal% aBunchOfDNAseFootprints
>
> or
>
> DMRs %in% genes ## what the hell does this really mean,
> anyways?
> it's so bad on so many levels
>
> because whenever someone says "what is the advantage of
Ranges-based
> analyses?", these are the archetypal sorts of queries that
come
> to mind.
> Except that usually in my examples they are based on
posterior
> probabilities, but perhaps that could stand to change.
>
> Anyways, that's just my bias, and you're doing the heavy
> lifting. But
> if people agree with the motivations I will write the patch
today.
>
> Cheers,
>
> --t
>
>
>
>
> On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">
>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
wrote:
>
> Hi Tim,
>
> I could add the %ov% operator as a replacement for the
old
> %in%. So you
> would write 'peaks %ov% genes' instead of 'peaks %in%
> genes'. Would just
> be a convenience wrapper for 'overlapsAny(peaks,
genes)'.
>
> Cheers,
> H.
>
>
> On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
>
> So why not leave %in% as it was and transition
> everything forward to
> explicitly using { `%within%`,
> `%overlaps%`|`%overlapping%`,
> `%equals%`
> } such that
>
> identical( x %within% table, countOverlaps(x,
table,
> type='within') >
> 0 ) == TRUE
> identical( x %overlaps% table, countOverlaps(x,
table,
> type='any') >
> 0 ) == TRUE
> identical( x %equals% table, countOverlaps(x,
table,
> type='equal') >
> 0 ) == TRUE
>
> and for the time being,
>
> identical( x %overlaps% table, countOverlaps(x,
table,
> type='any') >
> 0 ) == TRUE ## but with a noisy nastygram that will
halt if
> options("warn"=2)
> No breakage for %in% methods until such time as a
full
> deprecation cycle
> has passed, and if the maintainers can't be arsed
to do
> anything
> at all
> about the warnings by the second full release, then
> perhaps they
> don't
> really care that much after all. Just a thought?
>
> From someone (me) who has their own issues with
keeping
> everything up
> to date and should know better. If you want to use
> %in% for
>
> peaks %in% genes (why on earth would you do
this
> rather than
> peaks
> %in% promoters(genes), anyways?)
>
> then a nastygram could be emitted "WARNING: YOUR
SHORTHAND
> NOTATION IS
> DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED"
and
> everyone is
> (more
> or less) happy.
>
>
>
> On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
> <lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>
>
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">__com
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>>> wrote:
>
>
>
> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès
> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>
wrote:
>
> Hi Michael,
>
> I don't think "match" (the word) always
has to
> mean
> "equality"
> either.
> However having match() (the function) do
> "whole exact
> matching" (aka
> "equality") for any kind of vector-like
object
> has the
> advantage of:
>
> (a) making it consistent with
base::match()
> (?base::match is
> pretty
> explicit about what the contract of
> match() is)
>
>
> (a) alone is obviously not enough. We have
many
> methods,
> like the
> set operations, that treat ranges specially.
Are
> we going
> to start
> moving everything toward the base behavior?
And have
> rangeIntersect,
> rangeSetdiff, etc?
>
> (b) preserving its relationship with
==,
> duplicated(), unique(),
> etc...
>
>
> So it becomes consistent with
duplicated/unique,
> but we lose
> consistency with the set operations.
>
> (c) not frustrating the user who needs
> something to
> do exact
> matching on ranges (as I mentioned
> previously,
> if you take
> match() away from him/her, s/he'll
be
> left with
> nothing).
>
>
> No one has ever asked for match() to behave
this
> way. There
> was a
> request for a way to tabulate identical
ranges. It
> was a
> nice idea
> to extract the general "outer equal"
findMatches
> function.
> But the
> changes seem to be snow-balling. These types
of
> changes
> mean a lot
> of maintenance work for the users. A
deprecation
> cycle does not
> circumvent that.
>
>
> IMO those advantages counterbalance *by
far*
> the very
> little
> convenience you get from having
'match(query,
> subject)' do
> 'findOverlaps(query, subject,
select="first")' on
> IRanges/GRanges objects. If you need to do
> that, just
> use the
> latter, or, if you think that's still too
much
> typing,
> define
> a wrapper e.g. 'ovmatch(query, subject)'.
>
> There are plenty of specialized tools
around
> for doing
> inexact/fuzzy/partial/overlap matching for
many
> particular types
> of vector-like objects: grep() and family,
> pmatch(),
> charmatch(),
> agrep(), grepRaw(), matchPattern() and
family,
> findOverlaps() and
> family, findIntervals(), etc... For the
> reasons I mentioned
> above, none of them should hijack match()
to
> make it do
> some
> particular type of inexact matching on
some
> particular
> type of
> objects. Even if, for that particular type
of
> objects,
> doing that
> particular type of inexact matching is
more
> common than
> doing
> exact matching.
>
> H.
>
>
>
> On 01/06/2013 05:39 PM, Michael Lawrence
wrote:
>
> I think having overlapsAny is a nice
> addition and
> helps make
> the API
> more complete and explicit. Are you
sure
> we need to
> change
> the behavior
> of the match method for this
relatively
> uncommon
> use case?
>
>
> Yes because otherwise users with a use
case of
> doing
> match()
>
> even if it's uncommon,
>
>
> I don't think
> "match" always has to mean "equality".
It
> is a more
> general
> concept in
> my mind. The most common use case for
matching
> ranges is
> overlap.
>
>
> Of course "match" doesn't always have to
mean
> equality.
> But of base
>
>
> Michael
>
>
> On Fri, Jan 4, 2013 at 8:34 PM, Hervé
Pagès
> <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>>
wrote:
>
> Yes 'peaks %in% genes' is cute
and was
> probably doing
> the right thing
> for most users (although not
all).
> But 'exons %in%
> genes' is cute too
> and was probably doing the wrong
> thing for
> all users.
> Advanced users
> like you guys would have no
problem
> switching to
>
> !is.na <http: is.na="">
> <http: is.na=""> <http: is.na="">
>
> <http: is.na="">(findOverlaps(____peaks,
genes,
>
>
> type="within",
>
> select="any"))
>
> or
>
> !is.na <http: is.na="">
> <http: is.na=""> <http: is.na="">
>
> <http: is.na="">(findOverlaps(____peaks,
genes,
>
>
> type="equal",
>
>
> select="any"))
>
> in case 'peaks %in% genes' was
not doing
> exactly what
> you wanted,
> but most users would not find
this
> particularly
> friendly. Even
> worse, some users probably didn't
> realize that
> 'peaks
> %in% genes'
> was not doing exactly what they
> thought it did
> because
> "peaks in
> genes" in English suggests that
the
> peaks are
> within
> the genes,
> but it's not what 'peaks %in%
genes'
> does.
>
> Having overlapsAny(), with
exactly
> the same extra
> arguments as
> countOverlaps() and
> subsetByOverlaps() (i.e.
> 'maxgap',
> 'minoverlap',
> 'type', 'ignore.strand'), all of
them
> documented (and
> with most
> users more or less familiar with
them
> already)
> has the
> virtue to
> expose the user to all the
options
> from the
> very start,
> and to
> help him/her make the right
choice.
> Of course
> there
> will be users
> that don't want or don't have the
time to
> read/think
> about all the
> options. Not a big deal: they'll
just do
> 'overlapsAny(query, subject)',
> which is not a lot more typing
than
> 'query %in%
> subject', especially
> if they use tab completion.
>
> It's true that it's more common
to ask
> questions about
> overlap than
> about equality but there are some
use
> cases
> for the
> latter (as the
> original thread shows). Until
now,
> when you
> had such a
> use case, you
> could not use match() or %in%,
which
> would
> have been
> the natural things
> to use, because they got hijacked
to do
> something else,
> and you were
> left with nothing. Not a
satisfying
> situation.
> So at a
> minimum, we
> needed to restore the
true/real/original
> semantic of
> match() to do
> "equality" instead of "overlap".
But
> it's hard
> to do
> this for match()
> and not do it for %in% too. For
more
> than 99% of R
> users, %in% is
> just a simple wrapper for
'match(x,
> table,
> nomatch = 0)
> > 0' (this
> is how it has been documented and
> implemented
> in base R
> for many
> years). Not maintaining this
relationship
> between %in%
> and match()
> would only cause grief and
frustration to
> newcomers to
> Bioconductor.
>
> H.
>
>
>
> On 01/04/2013 03:32 PM, Cook,
Malcolm
> wrote:
>
> Hiya again,
>
> I am definitely a late comer
to
> BioC, so I
> definitely easily
> defer to
> the tide of history.
>
> But I do think you miss my
point
> Michael
> about the
> proposed change
> making the relationship
between
> %in% and
> match for
> {G,I}Ranges{List}
> mimic that between other
vectors,
> and I do
> think
> that changing
> the API
> would make other late-comers
take
> to BioC
> easier/faster.
>
> That said, I NEVER use %in%
so I
> really
> have no
> stake in the
> matter, and
> I DEFINITELY appreciate the
> argument to not
> changing the API
> just for
> sematic sweetness.
>
> That that said, Herve is _/so
> good/_ about
> deprecations and warnings
>
> that make such changes fairly
easily
> digestible.
>
> That that that.... enough....
I
> bow out of
> this
> one....!!!!
>
> Always learning and Happy New
> Year to all
> lurkers,
>
> ~Malcolm
>
> *From:*Michael Lawrence
>
> [mailto:lawrence.michael at gene
> <mailto:lawrence.michael at="" gene="">
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">>.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">>.__>____com
>
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.>____com
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">__com
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>>>]
> *Sent:* Friday, January 04,
2013
> 5:11 PM
> *To:* Cook, Malcolm
> *Cc:* Sean Davis; Michael
> Lawrence; Herv?
> Pag?s
> (hpages at fhcrc.org
> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
>
>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>); Tim
>
>
>
> Triche, Jr.; Vedran Franke;
> bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>
>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>>
>
> *Subject:* Re: [BioC]
> countMatches() (was:
> table
> for GenomicRanges)
>
>
> On Fri, Jan 4, 2013 at 1:56
PM,
> Cook, Malcolm
> <mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>
> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>>
> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>
> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">>
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>>>>
wrote:
>
> Hiya,
>
> For what it is worth...
>
> I think the change to %in% is
> warranted.
>
> If I understand correctly,
this
> change
> restores the
> relationship
> between
> the semantics of `%in` and
the
> semantics
> of `match`.
>
> From the docs:
>
> '"%in%" <- function(x,
table)
> match(x,
> table,
> nomatch = 0) > 0'
>
> Herve's change restores this
> relationship.
>
>
> match and %in% were initially
> consistent (both
> considering any
> overlap);
> Herve has changed both of
them
> together.
> The whole
> idea behind
> IRanges
> is that ranges are special
data
> types with
> special
> semantics. We
> have
> reimplemented much of the
existing R
> vector API
> using those
> semantics;
> this extends beyond
match/%in%. I am
> hesitant about
> making such
> sweeping
> changes to the API so late in
the
> life-cycle of the
> package.
> There was a
> feature request for a way to
count
> identical ranges
> in a set of
> ranges.
> Let's please not get carried
away
> and start
> redesigning the API
> for this
> one, albeit useful, request.
> There are all
> sorts of
> inconsistencies in
> the API, and many of them
were
> conscious
> decisions
> that considered
> practical use cases.
>
> Michael
>
>
> Herve, I suspect you
were
> you as a
> result able to
> completely drop
> all the
> `%in%,BiocClass1,BiocClass2`
> definitions and depend
> upon
> base::%in%
>
> Am I right?
>
> If so, may I suggest
that
> Herve stay the
> course, with the
> addition of
> '"%ol%" <-
function(a, b)
> findOverlaps(a,
> b, maxgap=0L,
> minoverlap=1L,
type='any',
> select='all') > 0'
>
> This would provide a
> perspicacious
> idiom, thereby
> optimizing the API
> for Michaels observed
common
> use case.
>
> Just sayin'
>
> ~Malcolm
>
>
> .-----Original
Message-----
> .From:
>
> bioconductor-bounces at r-______project.org
> <mailto:bioconductor-bounces at="" r-______project.org="">
> <mailto:bioconductor-bounces at="" r-____project.org=""> <mailto:bioconductor-bounces at="" r-____project.org="">>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>__r-__project.org
> <http: r-__project.org="">
> <mailto:bioconductor-bounces at="" r-__project.org=""> <mailto:bioconductor-bounces at="" r-__project.org="">>>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>____r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>__r-project.org
<http: r-project.org="">
> <mailto:bioconductor-bounces at="" r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>>
>
>
> <mailto:bioconductor-bounces@ <mailto:bioconductor-="" bounces@="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <http: r-project.org="">
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>____r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>__r-project.org
<http: r-project.org="">
> <mailto:bioconductor-bounces at="" r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>>>
>
> [mailto:bioconductor-bounces@ <mailto:bioconductor- bounces@="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <http: r-project.org="">
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>____r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>__r-project.org
<http: r-project.org="">
> <mailto:bioconductor-bounces at="" r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>>
>
>
> <mailto:bioconductor-bounces@ <mailto:bioconductor-="" bounces@="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <http: r-project.org="">
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>____r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>__r-project.org
<http: r-project.org="">
> <mailto:bioconductor-bounces at="" r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>>>] On
Behalf Of Sean
> Davis
> .Sent: Friday, January
04,
> 2013 3:37 PM
> .To: Michael Lawrence
> .Cc: Tim Triche, Jr.;
> Vedran Franke;
> bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>>
>
> <mailto:bioconductor at="" r-______project.org=""> <mailto:bioconductor at="" r-______project.org="">
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">>
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>>
>
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>>>
>
> .Subject: Re: [BioC]
> countMatches()
> (was:
> table for
> GenomicRanges)
> .
> .On Fri, Jan 4, 2013
at
> 4:32 PM,
> Michael
> Lawrence
>
> .<lawrence.michael at="" gene.com="" <mailto:lawrence.michael="" at="" gene.com="">
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">__com
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>>
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.>____com
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">__com
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>>>
>
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">>.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">>.__>____com
>
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.>____com
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">__com
> <mailto:lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">>>>>> wrote:
> .> The change to the
> behavior of
> %in% is a
> pretty big
> one. Are you
> thinking
> .> that all set-based
> operations should
> behave this way? For
> example, setdiff
> .> and intersect? I
really
> liked
> the syntax
> of "peaks
> %in% genes".
> In my
> .> experience, it's
way
> more common
> to ask
> questions
> about overlap
> than about
> .> equality, so I'd
rather
> optimize
> the API
> for that use
> case. But
> again,
> .> that's just my
personal
> bias.
> .
> .For what it is worth,
I share
> Michael's
> personal bias here.
> .
> .Sean
> .
> .
> .> Michael
> .>
> .>
> .> On Fri, Jan 4, 2013
at
> 1:11 PM,
> Hervé Pagès
> <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>>>
> wrote:
> .>
> .>> Hi,
> .>>
> .>> I added
findMatches() and
> countMatches()
> to the
> latest IRanges /
> .>> GenomicRanges
packages
> (in BioC
> devel only).
> .>>
> .>> findMatches(x,
> table): An
> enhanced
> version of
> ?match? that
> .>> returns
all the
> matches in a
> Hits object.
> .>>
> .>> countMatches(x,
table):
> Returns an
> integer vector
> of the length
> .>> of ?x?,
> containing
> the number
> of matches in
> ?table? for
> .>> each
element
> in ?x?.
> .>>
>
> .>> countMatches() is
what
> you can
> use to
> tally/count/tabulate
> (choose your
>
> .>> preferred term)
the unique
> elements in a
> GRanges object:
> .>>
> .>>
library(GenomicRanges)
> .>> set.seed(33)
> .>> gr <-
GRanges("chr1",
>
>
> IRanges(sample(15,20,replace=*______*TRUE),
>
>
>
>
> width=5))
> .>>
> .>> Then:
> .>>
> .>> > gr_levels <-
> sort(unique(gr))
> .>> >
> countMatches(gr_levels, gr)
> .>> [1] 1 1 1 2 4 2
2 1
> 2 2 2
> .>>
> .>> Note that
> findMatches() and
> countMatches() also work on
> IRanges and
> .>> DNAStringSet
objects,
> as well as on
> ordinary atomic
> vectors:
> .>>
> .>>
library(hgu95av2probe)
> .>>
library(Biostrings)
> .>> probes <-
> DNAStringSet(hgu95av2probe)
> .>> unique_probes <-
> unique(probes)
> .>> count <-
> countMatches(unique_probes,
> probes)
> .>> max(count) # 7
> .>>
> .>> I made other
changes in
> IRanges/GenomicRanges so that
> the notion
> .>> of "match" between
> elements of a
> vector-like object now
> consistently
> .>> means "equality"
> instead of
> "overlap",
> even for
> range-based
> objects
> .>> like IRanges or
GRanges
> objects. This
> notion of
> "equality" is the
> .>> same that is used
by
> ==. The most
> visible consequence
> of those
> .>> changes is that
using %in%
> between 2
> IRanges or
> GRanges objects
> .>> 'query' and
'subject'
> in order
> to do
> overlaps was
> replaced by
> .>> overlapsAny(query,
> subject).
> .>>
> .>>
overlapsAny(query,
> subject):
> Finds the
> ranges in
> ?query? that
> .>> overlap any
of
> the ranges
> in ?subject?.
> .>>
>
> .>> There are warnings
and
> deprecation
> messages in place
> to help
> smooth
>
> .>> the transition.
> .>>
> .>> Cheers,
> .>> H.
> .>>
> .>> --
> .>> Hervé Pagès
> .>>
> .>> Program in
> Computational Biology
> .>> Division of Public
> Health Sciences
> .>> Fred Hutchinson
Cancer
> Research
> Center
> .>> 1100 Fairview Ave.
N,
> M1-B514
> .>> P.O. Box 19024
> .>> Seattle, WA
98109-1024
> .>>
> .>> E-mail:
> hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>>>
>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>>
>
> .>> Phone: (206)
667-5791
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> .>> Fax: (206)
667-1319
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
> .>>
> .>
> .>
[[alternative HTML
> version deleted]]
> .>
> .>
> .>
>
>
> _____________________________________________________
>
>
>
>
> .> Bioconductor
mailing list
> .>
> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>>
>
>
> <mailto:bioconductor at="" r-______project.org=""> <mailto:bioconductor at="" r-______project.org="">
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">>
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>>
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>>>
>
> .>
>
> https://stat.ethz.ch/mailman/______listinfo/bioconductor
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>
>
>
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
>
>
>
>
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
>
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
> .> Search the
archives:
>
> http://news.gmane.org/gmane.______science.biology.informatic
s.______conductor
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
>
>
>
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">>
>
>
>
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">
>
>
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
>
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>>
> .
>
>
._____________________________________________________
>
>
>
>
> .Bioconductor mailing
list
>
> .Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>>
>
>
> <mailto:bioconductor at="" r-______project.org=""> <mailto:bioconductor at="" r-______project.org="">
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>>
>
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">>>>>
>
>
>
.https://stat.ethz.ch/mailman/______listinfo/bioconductor
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
>
>
>
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
>
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
> .Search the archives:
>
> http://news.gmane.org/gmane.______science.biology.informatic
s.______conductor
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
>
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">>
>
>
>
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">
>
>
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
>
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health
Sciences
> Fred Hutchinson Cancer Research
Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>
>
> Phone: (206) 667-5791
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
>
> Fax: (206) 667-1319
> <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
> Phone: (206) 667-5791
> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319
<tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
>
>
> --
>
> /A model is a lie that helps you see the truth./
> /
> /
> Howard Skipper
>
>
>
<http: cancerres.__aacrjournals.org="" content="" 31="" 9="" __1173.full.pdf="" <http:="" aacrjournals.org="" content="" 31="" 9="" __1173.full.pdf="">
>
<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">>
>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
>
> ...
>
> [Message clipped]
>
>
>
> --
> /A model is a lie that helps you see the truth./
>
> Howard Skipper
> <http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
.Hi Tim,
.
.I could add the %ov% operator as a replacement for the old %in%. So
you
.would write 'peaks %ov% genes' instead of 'peaks %in% genes'. Would
just
.be a convenience wrapper for 'overlapsAny(peaks, genes)'.
[cloak off]
Herve, I think this is the BEST course, and except for one letter, is
what I hoped I meant back when I wrote:
> If so, may I suggest that Herve stay the
> course, with the
> addition of
> '"%ol%" <- function(a, b)
findOverlaps(a,
> b, maxgap=0L,
> minoverlap=1L, type='any', select='all') >
0'
Stay the course, captain.
[cloak on]
.
.Cheers,
.H.
.
.On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
.> So why not leave %in% as it was and transition everything forward
to
.> explicitly using { `%within%`, `%overlaps%`|`%overlapping%`,
`%equals%`
.> } such that
.>
.> identical( x %within% table, countOverlaps(x, table,
type='within') >
.> 0 ) == TRUE
.> identical( x %overlaps% table, countOverlaps(x, table,
type='any') >
.> 0 ) == TRUE
.> identical( x %equals% table, countOverlaps(x, table,
type='equal') >
.> 0 ) == TRUE
.>
.> and for the time being,
.>
.> identical( x %overlaps% table, countOverlaps(x, table,
type='any') >
.> 0 ) == TRUE ## but with a noisy nastygram that will halt if
.> options("warn"=2)
.> No breakage for %in% methods until such time as a full deprecation
cycle
.> has passed, and if the maintainers can't be arsed to do anything
at all
.> about the warnings by the second full release, then perhaps they
don't
.> really care that much after all. Just a thought?
.>
.> From someone (me) who has their own issues with keeping
everything up
.> to date and should know better. If you want to use %in% for
.>
.> peaks %in% genes (why on earth would you do this rather than
peaks
.> %in% promoters(genes), anyways?)
.>
.> then a nastygram could be emitted "WARNING: YOUR SHORTHAND
NOTATION IS
.> DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and everyone is
(more
.> or less) happy.
.>
.>
.>
.> On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
.> <lawrence.michael at="" gene.com="" <mailto:lawrence.michael="" at="" gene.com="">> wrote:
.>
.>
.>
.>
.> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès <hpages at="" fhcrc.org="" .=""> <mailto:hpages at="" fhcrc.org="">> wrote:
.>
.> Hi Michael,
.>
.> I don't think "match" (the word) always has to mean
"equality"
.> either.
.> However having match() (the function) do "whole exact
matching" (aka
.> "equality") for any kind of vector-like object has the
advantage of:
.>
.> (a) making it consistent with base::match()
(?base::match is
.> pretty
.> explicit about what the contract of match() is)
.>
.>
.> (a) alone is obviously not enough. We have many methods, like
the
.> set operations, that treat ranges specially. Are we going to
start
.> moving everything toward the base behavior? And have
rangeIntersect,
.> rangeSetdiff, etc?
.>
.> (b) preserving its relationship with ==, duplicated(),
unique(),
.> etc...
.>
.>
.> So it becomes consistent with duplicated/unique, but we lose
.> consistency with the set operations.
.>
.> (c) not frustrating the user who needs something to do
exact
.> matching on ranges (as I mentioned previously, if
you take
.> match() away from him/her, s/he'll be left with
nothing).
.>
.>
.> No one has ever asked for match() to behave this way. There
was a
.> request for a way to tabulate identical ranges. It was a nice
idea
.> to extract the general "outer equal" findMatches function. But
the
.> changes seem to be snow-balling. These types of changes mean
a lot
.> of maintenance work for the users. A deprecation cycle does
not
.> circumvent that.
.>
.>
.> IMO those advantages counterbalance *by far* the very
little
.> convenience you get from having 'match(query, subject)' do
.> 'findOverlaps(query, subject, select="first")' on
.> IRanges/GRanges objects. If you need to do that, just use
the
.> latter, or, if you think that's still too much typing,
define
.> a wrapper e.g. 'ovmatch(query, subject)'.
.>
.> There are plenty of specialized tools around for doing
.> inexact/fuzzy/partial/overlap matching for many particular
types
.> of vector-like objects: grep() and family, pmatch(),
charmatch(),
.> agrep(), grepRaw(), matchPattern() and family,
findOverlaps() and
.> family, findIntervals(), etc... For the reasons I
mentioned
.> above, none of them should hijack match() to make it do
some
.> particular type of inexact matching on some particular
type of
.> objects. Even if, for that particular type of objects,
doing that
.> particular type of inexact matching is more common than
doing
.> exact matching.
.>
.> H.
.>
.>
.>
.> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
.>
.> I think having overlapsAny is a nice addition and
helps make
.> the API
.> more complete and explicit. Are you sure we need to
change
.> the behavior
.> of the match method for this relatively uncommon use
case?
.>
.>
.> Yes because otherwise users with a use case of doing
match()
.>
.> even if it's uncommon,
.>
.>
.> I don't think
.> "match" always has to mean "equality". It is a more
general
.> concept in
.> my mind. The most common use case for matching ranges
is
.> overlap.
.>
.>
.> Of course "match" doesn't always have to mean equality.
But of base
.>
.>
.> Michael
.>
.>
.> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
.> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
.> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>> wrote:
.>
.> Yes 'peaks %in% genes' is cute and was probably
doing
.> the right thing
.> for most users (although not all). But 'exons
%in%
.> genes' is cute too
.> and was probably doing the wrong thing for all
users.
.> Advanced users
.> like you guys would have no problem switching to
.>
.> !is.na <http: is.na="">
.> <http: is.na="">(findOverlaps(__peaks, genes,
type="within",
.> select="any"))
.>
.> or
.>
.> !is.na <http: is.na="">
.> <http: is.na="">(findOverlaps(__peaks, genes,
type="equal",
.>
.> select="any"))
.>
.> in case 'peaks %in% genes' was not doing exactly
what
.> you wanted,
.> but most users would not find this particularly
.> friendly. Even
.> worse, some users probably didn't realize that
'peaks
.> %in% genes'
.> was not doing exactly what they thought it did
because
.> "peaks in
.> genes" in English suggests that the peaks are
within
.> the genes,
.> but it's not what 'peaks %in% genes' does.
.>
.> Having overlapsAny(), with exactly the same extra
.> arguments as
.> countOverlaps() and subsetByOverlaps() (i.e.
'maxgap',
.> 'minoverlap',
.> 'type', 'ignore.strand'), all of them documented
(and
.> with most
.> users more or less familiar with them already)
has the
.> virtue to
.> expose the user to all the options from the very
start,
.> and to
.> help him/her make the right choice. Of course
there
.> will be users
.> that don't want or don't have the time to
read/think
.> about all the
.> options. Not a big deal: they'll just do
.> 'overlapsAny(query, subject)',
.> which is not a lot more typing than 'query %in%
.> subject', especially
.> if they use tab completion.
.>
.> It's true that it's more common to ask questions
about
.> overlap than
.> about equality but there are some use cases for
the
.> latter (as the
.> original thread shows). Until now, when you had
such a
.> use case, you
.> could not use match() or %in%, which would have
been
.> the natural things
.> to use, because they got hijacked to do something
else,
.> and you were
.> left with nothing. Not a satisfying situation. So
at a
.> minimum, we
.> needed to restore the true/real/original semantic
of
.> match() to do
.> "equality" instead of "overlap". But it's hard to
do
.> this for match()
.> and not do it for %in% too. For more than 99% of
R
.> users, %in% is
.> just a simple wrapper for 'match(x, table,
nomatch = 0)
.> > 0' (this
.> is how it has been documented and implemented in
base R
.> for many
.> years). Not maintaining this relationship between
%in%
.> and match()
.> would only cause grief and frustration to
newcomers to
.> Bioconductor.
.>
.> H.
.>
.>
.>
.> On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
.>
.> Hiya again,
.>
.> I am definitely a late comer to BioC, so I
.> definitely easily
.> defer to
.> the tide of history.
.>
.> But I do think you miss my point Michael
about the
.> proposed change
.> making the relationship between %in% and
match for
.> {G,I}Ranges{List}
.> mimic that between other vectors, and I do
think
.> that changing
.> the API
.> would make other late-comers take to BioC
.> easier/faster.
.>
.> That said, I NEVER use %in% so I really have
no
.> stake in the
.> matter, and
.> I DEFINITELY appreciate the argument to not
.> changing the API
.> just for
.> sematic sweetness.
.>
.> That that said, Herve is _/so good/_ about
.> deprecations and warnings
.>
.> that make such changes fairly easily
digestible.
.>
.> That that that.... enough.... I bow out of
this
.> one....!!!!
.>
.> Always learning and Happy New Year to all
lurkers,
.>
.> ~Malcolm
.>
.> *From:*Michael Lawrence
.> [mailto:lawrence.michael at gene.
.> <mailto:lawrence.michael at="" gene.="">____com
.>
.> <mailto:lawrence.michael at="" gene.__com="" .=""> <mailto:lawrence.michael at="" gene.com="">>]
.> *Sent:* Friday, January 04, 2013 5:11 PM
.> *To:* Cook, Malcolm
.> *Cc:* Sean Davis; Michael Lawrence; Hervé
Pagès
.> (hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
.> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>); Tim
.>
.>
.> Triche, Jr.; Vedran Franke;
.> bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
.> <mailto:bioconductor at="" r-__project.org="" .=""> <mailto:bioconductor at="" r-project.org="">>
.> *Subject:* Re: [BioC] countMatches() (was:
table
.> for GenomicRanges)
.>
.>
.> On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm
.> <mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
.> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>
.> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
.> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>> wrote:
.>
.> Hiya,
.>
.> For what it is worth...
.>
.> I think the change to %in% is warranted.
.>
.> If I understand correctly, this change
restores the
.> relationship
.> between
.> the semantics of `%in` and the semantics of
`match`.
.>
.> From the docs:
.>
.> '"%in%" <- function(x, table) match(x,
table,
.> nomatch = 0) > 0'
.>
.> Herve's change restores this relationship.
.>
.>
.> match and %in% were initially consistent
(both
.> considering any
.> overlap);
.> Herve has changed both of them together. The
whole
.> idea behind
.> IRanges
.> is that ranges are special data types with
special
.> semantics. We
.> have
.> reimplemented much of the existing R vector
API
.> using those
.> semantics;
.> this extends beyond match/%in%. I am hesitant
about
.> making such
.> sweeping
.> changes to the API so late in the life-cycle
of the
.> package.
.> There was a
.> feature request for a way to count identical
ranges
.> in a set of
.> ranges.
.> Let's please not get carried away and start
.> redesigning the API
.> for this
.> one, albeit useful, request. There are all
sorts of
.> inconsistencies in
.> the API, and many of them were conscious
decisions
.> that considered
.> practical use cases.
.>
.> Michael
.>
.>
.> Herve, I suspect you were you as a
result able to
.> completely drop
.> all the `%in%,BiocClass1,BiocClass2`
.> definitions and depend
.> upon
.> base::%in%
.>
.> Am I right?
.>
.> If so, may I suggest that Herve stay the
.> course, with the
.> addition of
.> '"%ol%" <- function(a, b)
findOverlaps(a,
.> b, maxgap=0L,
.> minoverlap=1L, type='any', select='all')
> 0'
.>
.> This would provide a perspicacious
idiom, thereby
.> optimizing the API
.> for Michaels observed common use case.
.>
.> Just sayin'
.>
.> ~Malcolm
.>
.>
.> .-----Original Message-----
.> .From:
.> bioconductor-bounces at r-____project.org
.> <mailto:bioconductor-bounces at="" r-__project.org="">
.> <mailto:bioconductor-bounces at="" __r-="" project.org="" .=""> <mailto:bioconductor-bounces at="" r-project.org="">>
.> <mailto:bioconductor-bounces@ .=""> <mailto:bioconductor-bounces@>____r-project.org
.> <http: r-project.org="">
.> <mailto:bioconductor-bounces at="" __r-="" project.org="" .=""> <mailto:bioconductor-bounces at="" r-project.org="">>>
.> [mailto:bioconductor-bounces@
.> <mailto:bioconductor-bounces@>____r-project.org
.> <http: r-project.org="">
.> <mailto:bioconductor-bounces at="" __r-="" project.org="" .=""> <mailto:bioconductor-bounces at="" r-project.org="">>
.>
.> <mailto:bioconductor-bounces@ .=""> <mailto:bioconductor-bounces@>____r-project.org
.> <http: r-project.org="">
.>
.> <mailto:bioconductor-bounces at="" __r-="" project.org="" .=""> <mailto:bioconductor-bounces at="" r-project.org="">>>] On
Behalf Of Sean
.> Davis
.> .Sent: Friday, January 04, 2013 3:37
PM
.> .To: Michael Lawrence
.> .Cc: Tim Triche, Jr.; Vedran Franke;
.> bioconductor at r-project.org
.> <mailto:bioconductor at="" r-project.org="">
.> <mailto:bioconductor at="" r-__project.org="" .=""> <mailto:bioconductor at="" r-project.org="">>
.> <mailto:bioconductor at="" r-____project.org="" .=""> <mailto:bioconductor at="" r-__project.org="">
.>
.> <mailto:bioconductor at="" r-__project.org="" .=""> <mailto:bioconductor at="" r-project.org="">>>
.>
.> .Subject: Re: [BioC] countMatches()
(was:
.> table for
.> GenomicRanges)
.> .
.> .On Fri, Jan 4, 2013 at 4:32 PM,
Michael
.> Lawrence
.> .<lawrence.michael at="" gene.com="" .=""> <mailto:lawrence.michael at="" gene.com="">
.> <mailto:lawrence.michael at="" gene.__com="" .=""> <mailto:lawrence.michael at="" gene.com="">>
.> <mailto:lawrence.michael at="" gene.="" .=""> <mailto:lawrence.michael at="" gene.="">____com
.>
.> <mailto:lawrence.michael at="" gene.__com="" .=""> <mailto:lawrence.michael at="" gene.com="">>>> wrote:
.> .> The change to the behavior of %in%
is a
.> pretty big
.> one. Are you
.> thinking
.> .> that all set-based operations
should
.> behave this way? For
.> example, setdiff
.> .> and intersect? I really liked the
syntax
.> of "peaks
.> %in% genes".
.> In my
.> .> experience, it's way more common to
ask
.> questions
.> about overlap
.> than about
.> .> equality, so I'd rather optimize
the API
.> for that use
.> case. But
.> again,
.> .> that's just my personal bias.
.> .
.> .For what it is worth, I share
Michael's
.> personal bias here.
.> .
.> .Sean
.> .
.> .
.> .> Michael
.> .>
.> .>
.> .> On Fri, Jan 4, 2013 at 1:11 PM,
Hervé Pagès
.> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
.> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
.> <mailto:hpages at="" fhcrc.org="" .=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="" .=""> <mailto:hpages at="" fhcrc.org="">>>> wrote:
.> .>
.> .>> Hi,
.> .>>
.> .>> I added findMatches() and
countMatches()
.> to the
.> latest IRanges /
.> .>> GenomicRanges packages (in BioC
devel only).
.> .>>
.> .>> findMatches(x, table): An
enhanced
.> version of
.> 'match' that
.> .>> returns all the matches
in a
.> Hits object.
.> .>>
.> .>> countMatches(x, table): Returns
an
.> integer vector
.> of the length
.> .>> of 'x', containing the
number
.> of matches in
.> 'table' for
.> .>> each element in 'x'.
.> .>>
.>
.> .>> countMatches() is what you can use
to
.> tally/count/tabulate
.> (choose your
.>
.> .>> preferred term) the unique
elements in a
.> GRanges object:
.> .>>
.> .>> library(GenomicRanges)
.> .>> set.seed(33)
.> .>> gr <- GRanges("chr1",
.> IRanges(sample(15,20,replace=*____*TRUE),
.>
.> width=5))
.> .>>
.> .>> Then:
.> .>>
.> .>> > gr_levels <- sort(unique(gr))
.> .>> > countMatches(gr_levels, gr)
.> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
.> .>>
.> .>> Note that findMatches() and
.> countMatches() also work on
.> IRanges and
.> .>> DNAStringSet objects, as well as
on
.> ordinary atomic
.> vectors:
.> .>>
.> .>> library(hgu95av2probe)
.> .>> library(Biostrings)
.> .>> probes <-
DNAStringSet(hgu95av2probe)
.> .>> unique_probes <- unique(probes)
.> .>> count <-
countMatches(unique_probes,
.> probes)
.> .>> max(count) # 7
.> .>>
.> .>> I made other changes in
.> IRanges/GenomicRanges so that
.> the notion
.> .>> of "match" between elements of a
.> vector-like object now
.> consistently
.> .>> means "equality" instead of
"overlap",
.> even for
.> range-based
.> objects
.> .>> like IRanges or GRanges objects.
This
.> notion of
.> "equality" is the
.> .>> same that is used by ==. The most
.> visible consequence
.> of those
.> .>> changes is that using %in% between
2
.> IRanges or
.> GRanges objects
.> .>> 'query' and 'subject' in order to
do
.> overlaps was
.> replaced by
.> .>> overlapsAny(query, subject).
.> .>>
.> .>> overlapsAny(query, subject):
Finds the
.> ranges in
.> 'query' that
.> .>> overlap any of the ranges in
'subject'.
.> .>>
.>
.> .>> There are warnings and deprecation
.> messages in place
.> to help
.> smooth
.>
.> .>> the transition.
.> .>>
.> .>> Cheers,
.> .>> H.
.> .>>
.> .>> --
.> .>> Hervé Pagès
.> .>>
.> .>> Program in Computational Biology
.> .>> Division of Public Health Sciences
.> .>> Fred Hutchinson Cancer Research
Center
.> .>> 1100 Fairview Ave. N, M1-B514
.> .>> P.O. Box 19024
.> .>> Seattle, WA 98109-1024
.> .>>
.> .>> E-mail: hpages at fhcrc.org
.> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="" .=""> <mailto:hpages at="" fhcrc.org="">>
.> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
.> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
.>
.> .>> Phone: (206) 667-5791
.> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
.> <tel:%28206%29%20667-5791>
.> .>> Fax: (206) 667-1319
.> <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319>
.> <tel:%28206%29%20667-1319>
.>
.> .>>
.> .>
.> .> [[alternative HTML version
deleted]]
.> .>
.> .>
.> .>
.> ___________________________________________________
.>
.> .> Bioconductor mailing list
.> .> Bioconductor at r-project.org
.> <mailto:bioconductor at="" r-project.org="">
.> <mailto:bioconductor at="" r-__project.org="" .=""> <mailto:bioconductor at="" r-project.org="">>
.> <mailto:bioconductor at="" r-____project.org="" .=""> <mailto:bioconductor at="" r-__project.org="">
.> <mailto:bioconductor at="" r-__project.org="" .=""> <mailto:bioconductor at="" r-project.org="">>>
.>
.> .>
.> https://stat.ethz.ch/mailman/____listinfo/bioconductor
.> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
.>
.>
.> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="" .=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>
.> .> Search the archives:
.> http://news.gmane.org/gmane.____science.biology.inform
atics.____conductor
.> <http: news.gmane.org="" gmane.__science.biology.informa="" tics.__conductor="">
.>
.>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="" .="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>
.> .
.>
.> .___________________________________________________
.>
.> .Bioconductor mailing list
.> .Bioconductor at r-project.org
.> <mailto:bioconductor at="" r-project.org="">
.> <mailto:bioconductor at="" r-__project.org="" .=""> <mailto:bioconductor at="" r-project.org="">>
.> <mailto:bioconductor at="" r-____project.org="" .=""> <mailto:bioconductor at="" r-__project.org="">
.> <mailto:bioconductor at="" r-__project.org="" .=""> <mailto:bioconductor at="" r-project.org="">>>
.>
.>
.>
.https://stat.ethz.ch/mailman/____listinfo/bioconductor
.> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
.>
.>
.> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="" .=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>
.> .Search the archives:
.> http://news.gmane.org/gmane.____science.biology.inform
atics.____conductor
.> <http: news.gmane.org="" gmane.__science.biology.informa="" tics.__conductor="">
.>
.>
.>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="" .="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>
.>
.>
.> --
.> Hervé Pagès
.>
.> Program in Computational Biology
.> Division of Public Health Sciences
.> Fred Hutchinson Cancer Research Center
.> 1100 Fairview Ave. N, M1-B514
.> P.O. Box 19024
.> Seattle, WA 98109-1024
.>
.> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
.> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
.>
.> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
.> <tel:%28206%29%20667-5791>
.> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
.> <tel:%28206%29%20667-1319>
.>
.>
.>
.> --
.> Hervé Pagès
.>
.> Program in Computational Biology
.> Division of Public Health Sciences
.> Fred Hutchinson Cancer Research Center
.> 1100 Fairview Ave. N, M1-B514
.> P.O. Box 19024
.> Seattle, WA 98109-1024
.>
.> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
.> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
.> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
.>
.>
.>
.>
.>
.> --
.> /A model is a lie that helps you see the truth./
.> /
.> /
.> Howard Skipper
.> <http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
.
.--
.Hervé Pagès
.
.Program in Computational Biology
.Division of Public Health Sciences
.Fred Hutchinson Cancer Research Center
.1100 Fairview Ave. N, M1-B514
.P.O. Box 19024
.Seattle, WA 98109-1024
.
.E-mail: hpages at fhcrc.org
.Phone: (206) 667-5791
.Fax: (206) 667-1319
On 01/07/2013 11:33 AM, Michael Lawrence wrote:
>
>
>
> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote:
>
> Hi Michael,
>
> I don't think "match" (the word) always has to mean "equality"
either.
> However having match() (the function) do "whole exact matching"
(aka
> "equality") for any kind of vector-like object has the advantage
of:
>
> (a) making it consistent with base::match() (?base::match is
pretty
> explicit about what the contract of match() is)
>
>
> (a) alone is obviously not enough. We have many methods, like the
set
> operations, that treat ranges specially. Are we going to start
moving
> everything toward the base behavior? And have rangeIntersect,
> rangeSetdiff, etc?
>
> (b) preserving its relationship with ==, duplicated(),
unique(),
> etc...
>
>
> So it becomes consistent with duplicated/unique, but we lose
consistency
> with the set operations.
Nope, we don't loose anything. Because match()/%in% were NOT
consistent
with the set operations anyway, that is, 'intersect(x, y)' on
IRanges/GRanges objects was not doing 'x[x %in% y]' (%in% here being
the old %in%).
>
> (c) not frustrating the user who needs something to do exact
> matching on ranges (as I mentioned previously, if you
take
> match() away from him/her, s/he'll be left with nothing).
>
>
> No one has ever asked for match() to behave this way.
Here is my use case: internally findMatches()/countMatches() are
implemented on top of match(), the fixed match(). They work on any
object for which match() works. They would also work on objects for
which match() does the wrong thing but they would return something
wrong. They could be made ordinary functions, not generic (and they
will, but they temporarily need to be made generics with methods,
just to smooth the transition), because dispatch happens inside the
function when match() is called. In the man page for those functions
I can just say:
findMatches(x, table): An enhanced version of ?match? that returns
all the matches in a Hits object.
and I'm done. It's clear and concise.
The implementation/documentation of findMatches()/countMatches() is
the typical illustration of why having methods that respect the
contract of the generic is a must.
The idea is to build on top of some basic building-blocks for which
the behavior is well-defined, consistent, predictable. It's sooo much
easier, and it's very healthy.
> There was a
> request for a way to tabulate identical ranges. It was a nice idea
to
> extract the general "outer equal" findMatches function.
It's also a nice idea to have findMatches() and countMatches() aligned
with match().
> But the changes seem to be snow-balling.
No snow-balling. You cannot snow-ball too far anyway when you restore
consistency. But you can easily snow-ball very far when you go on the
other direction (there is no limits). Do I need to say that aiming for
consistency/predictability is a good goal in software design? It can
only make it *better* in all the meanings of the term: less bugs,
easier to maintain, easier to document, and easier to use in the long
run. Everybody wins. Even if you don't realize it now. Convenience is
also important, but less important than consistency/predictability.
As a matter of fact, an interesting and not immediately obvious side
effect of going consistent is that, in the long run (i.e. when the
software becomes bigger and more complex), it also gives you a form of
convenience for the end-user: documentation is simpler and easier to
read, and there are less special cases to remember.
> These types of changes mean a lot of
> maintenance work for the users. A deprecation cycle does not
circumvent
> that.
I don't see why this change would be more work for the users than any
other change. Making RangedData fade away will certainly be a much
bigger one, will take much more time (maybe 2-3 years), and will
require a lot more maintenance work from us (mostly me) and from
the users.
FWIW, the change to match()/%in% probably means more work for me than
for the users. There is a *lot* of stuff I had to put in place in
IRanges/GenomicRanges to make this transition smooth. But I truly
believe it was worth it. I also fixed all the BioC packages I found
that were affected by those changes (surprisingly, there were very
few: only 5). I could have missed some. Please let me know if that
is the case and I'll fix them too.
Thanks,
H.
>
>
> IMO those advantages counterbalance *by far* the very little
> convenience you get from having 'match(query, subject)' do
> 'findOverlaps(query, subject, select="first")' on
> IRanges/GRanges objects. If you need to do that, just use the
> latter, or, if you think that's still too much typing, define
> a wrapper e.g. 'ovmatch(query, subject)'.
>
> There are plenty of specialized tools around for doing
> inexact/fuzzy/partial/overlap matching for many particular types
> of vector-like objects: grep() and family, pmatch(),
charmatch(),
> agrep(), grepRaw(), matchPattern() and family, findOverlaps()
and
> family, findIntervals(), etc... For the reasons I mentioned
> above, none of them should hijack match() to make it do some
> particular type of inexact matching on some particular type of
> objects. Even if, for that particular type of objects, doing
that
> particular type of inexact matching is more common than doing
> exact matching.
>
> H.
>
>
>
> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>
> I think having overlapsAny is a nice addition and helps make
the API
> more complete and explicit. Are you sure we need to change
the
> behavior
> of the match method for this relatively uncommon use case?
>
>
> Yes because otherwise users with a use case of doing match()
>
> even if it's uncommon,
>
>
> I don't think
> "match" always has to mean "equality". It is a more general
> concept in
> my mind. The most common use case for matching ranges is
overlap.
>
>
> Of course "match" doesn't always have to mean equality. But of
base
>
>
> Michael
>
>
> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
wrote:
>
> Yes 'peaks %in% genes' is cute and was probably doing
the
> right thing
> for most users (although not all). But 'exons %in%
genes'
> is cute too
> and was probably doing the wrong thing for all users.
> Advanced users
> like you guys would have no problem switching to
>
> !is.na <http: is.na="">
> <http: is.na="">(findOverlaps(__peaks, genes, type="within",
> select="any"))
>
> or
>
> !is.na <http: is.na="">
> <http: is.na="">(findOverlaps(__peaks, genes, type="equal",
>
> select="any"))
>
> in case 'peaks %in% genes' was not doing exactly what
you
> wanted,
> but most users would not find this particularly
friendly. Even
> worse, some users probably didn't realize that 'peaks
%in%
> genes'
> was not doing exactly what they thought it did because
> "peaks in
> genes" in English suggests that the peaks are within
the genes,
> but it's not what 'peaks %in% genes' does.
>
> Having overlapsAny(), with exactly the same extra
arguments as
> countOverlaps() and subsetByOverlaps() (i.e. 'maxgap',
> 'minoverlap',
> 'type', 'ignore.strand'), all of them documented (and
with most
> users more or less familiar with them already) has the
> virtue to
> expose the user to all the options from the very start,
and to
> help him/her make the right choice. Of course there
will be
> users
> that don't want or don't have the time to read/think
about
> all the
> options. Not a big deal: they'll just do
> 'overlapsAny(query, subject)',
> which is not a lot more typing than 'query %in%
subject',
> especially
> if they use tab completion.
>
> It's true that it's more common to ask questions about
> overlap than
> about equality but there are some use cases for the
latter
> (as the
> original thread shows). Until now, when you had such a
use
> case, you
> could not use match() or %in%, which would have been
the
> natural things
> to use, because they got hijacked to do something else,
and
> you were
> left with nothing. Not a satisfying situation. So at a
> minimum, we
> needed to restore the true/real/original semantic of
> match() to do
> "equality" instead of "overlap". But it's hard to do
this
> for match()
> and not do it for %in% too. For more than 99% of R
users,
> %in% is
> just a simple wrapper for 'match(x, table, nomatch = 0)
>
> 0' (this
> is how it has been documented and implemented in base R
for
> many
> years). Not maintaining this relationship between %in%
and
> match()
> would only cause grief and frustration to newcomers to
> Bioconductor.
>
> H.
>
>
>
> On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
>
> Hiya again,
>
> I am definitely a late comer to BioC, so I
definitely
> easily
> defer to
> the tide of history.
>
> But I do think you miss my point Michael about the
> proposed change
> making the relationship between %in% and match for
> {G,I}Ranges{List}
> mimic that between other vectors, and I do think
that
> changing
> the API
> would make other late-comers take to BioC
easier/faster.
>
> That said, I NEVER use %in% so I really have no
stake
> in the
> matter, and
> I DEFINITELY appreciate the argument to not
changing
> the API
> just for
> sematic sweetness.
>
> That that said, Herve is _/so good/_ about
deprecations
> and warnings
>
> that make such changes fairly easily digestible.
>
> That that that.... enough.... I bow out of this
one....!!!!
>
> Always learning and Happy New Year to all lurkers,
>
> ~Malcolm
>
> *From:*Michael Lawrence [mailto:lawrence.michael at
gene.
> <mailto:lawrence.michael at="" gene.="">____com
>
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>]
> *Sent:* Friday, January 04, 2013 5:11 PM
> *To:* Cook, Malcolm
> *Cc:* Sean Davis; Michael Lawrence; Hervé Pagès
> (hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>);
Tim
>
>
> Triche, Jr.; Vedran Franke; bioconductor at
r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> *Subject:* Re: [BioC] countMatches() (was: table
for
> GenomicRanges)
>
>
> On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm
> <mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>>
wrote:
>
> Hiya,
>
> For what it is worth...
>
> I think the change to %in% is warranted.
>
> If I understand correctly, this change restores the
> relationship
> between
> the semantics of `%in` and the semantics of
`match`.
>
> From the docs:
>
> '"%in%" <- function(x, table) match(x, table,
> nomatch = 0) > 0'
>
> Herve's change restores this relationship.
>
>
> match and %in% were initially consistent (both
> considering any
> overlap);
> Herve has changed both of them together. The whole
idea
> behind
> IRanges
> is that ranges are special data types with special
> semantics. We
> have
> reimplemented much of the existing R vector API
using those
> semantics;
> this extends beyond match/%in%. I am hesitant about
> making such
> sweeping
> changes to the API so late in the life-cycle of the
> package.
> There was a
> feature request for a way to count identical ranges
in
> a set of
> ranges.
> Let's please not get carried away and start
redesigning
> the API
> for this
> one, albeit useful, request. There are all sorts of
> inconsistencies in
> the API, and many of them were conscious decisions
that
> considered
> practical use cases.
>
> Michael
>
>
> Herve, I suspect you were you as a result able
to
> completely drop
> all the `%in%,BiocClass1,BiocClass2`
definitions
> and depend
> upon
> base::%in%
>
> Am I right?
>
> If so, may I suggest that Herve stay the
course,
> with the
> addition of
> '"%ol%" <- function(a, b) findOverlaps(a,
b,
> maxgap=0L,
> minoverlap=1L, type='any', select='all') > 0'
>
> This would provide a perspicacious idiom,
thereby
> optimizing the API
> for Michaels observed common use case.
>
> Just sayin'
>
> ~Malcolm
>
>
> .-----Original Message-----
> .From: bioconductor-bounces at
r-____project.org
> <mailto:bioconductor-bounces at="" r-__project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>
> [mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
>
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>] On Behalf
Of Sean
> Davis
> .Sent: Friday, January 04, 2013 3:37 PM
> .To: Michael Lawrence
> .Cc: Tim Triche, Jr.; Vedran Franke;
> bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
>
> .Subject: Re: [BioC] countMatches() (was:
table for
> GenomicRanges)
> .
> .On Fri, Jan 4, 2013 at 4:32 PM, Michael
Lawrence
> .<lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">____com
>
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>>> wrote:
> .> The change to the behavior of %in% is a
> pretty big
> one. Are you
> thinking
> .> that all set-based operations should
behave
> this way? For
> example, setdiff
> .> and intersect? I really liked the syntax
of
> "peaks
> %in% genes".
> In my
> .> experience, it's way more common to ask
questions
> about overlap
> than about
> .> equality, so I'd rather optimize the API
for
> that use
> case. But
> again,
> .> that's just my personal bias.
> .
> .For what it is worth, I share Michael's
> personal bias here.
> .
> .Sean
> .
> .
> .> Michael
> .>
> .>
> .> On Fri, Jan 4, 2013 at 1:11 PM, Hervé
Pagès
> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>
wrote:
> .>
> .>> Hi,
> .>>
> .>> I added findMatches() and countMatches()
to the
> latest IRanges /
> .>> GenomicRanges packages (in BioC devel
only).
> .>>
> .>> findMatches(x, table): An enhanced
version of
> ?match? that
> .>> returns all the matches in a
Hits
> object.
> .>>
> .>> countMatches(x, table): Returns an
integer
> vector
> of the length
> .>> of ?x?, containing the number
of
> matches in
> ?table? for
> .>> each element in ?x?.
> .>>
>
> .>> countMatches() is what you can use to
> tally/count/tabulate
> (choose your
>
> .>> preferred term) the unique elements in a
> GRanges object:
> .>>
> .>> library(GenomicRanges)
> .>> set.seed(33)
> .>> gr <- GRanges("chr1",
> IRanges(sample(15,20,replace=*____*TRUE),
>
> width=5))
> .>>
> .>> Then:
> .>>
> .>> > gr_levels <- sort(unique(gr))
> .>> > countMatches(gr_levels, gr)
> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
> .>>
> .>> Note that findMatches() and
countMatches()
> also work on
> IRanges and
> .>> DNAStringSet objects, as well as on
ordinary
> atomic
> vectors:
> .>>
> .>> library(hgu95av2probe)
> .>> library(Biostrings)
> .>> probes <- DNAStringSet(hgu95av2probe)
> .>> unique_probes <- unique(probes)
> .>> count <- countMatches(unique_probes,
probes)
> .>> max(count) # 7
> .>>
> .>> I made other changes in
> IRanges/GenomicRanges so that
> the notion
> .>> of "match" between elements of a vector-
like
> object now
> consistently
> .>> means "equality" instead of "overlap",
even for
> range-based
> objects
> .>> like IRanges or GRanges objects. This
notion of
> "equality" is the
> .>> same that is used by ==. The most
visible
> consequence
> of those
> .>> changes is that using %in% between 2
IRanges or
> GRanges objects
> .>> 'query' and 'subject' in order to do
> overlaps was
> replaced by
> .>> overlapsAny(query, subject).
> .>>
> .>> overlapsAny(query, subject): Finds the
> ranges in
> ?query? that
> .>> overlap any of the ranges in
?subject?.
> .>>
>
> .>> There are warnings and deprecation
messages
> in place
> to help
> smooth
>
> .>> the transition.
> .>>
> .>> Cheers,
> .>> H.
> .>>
> .>> --
> .>> Hervé Pagès
> .>>
> .>> Program in Computational Biology
> .>> Division of Public Health Sciences
> .>> Fred Hutchinson Cancer Research Center
> .>> 1100 Fairview Ave. N, M1-B514
> .>> P.O. Box 19024
> .>> Seattle, WA 98109-1024
> .>>
> .>> E-mail: hpages at fhcrc.org
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
>
> .>> Phone: (206) 667-5791
> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> .>> Fax: (206) 667-1319
> <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
> .>>
> .>
> .> [[alternative HTML version
deleted]]
> .>
> .>
> .>
> ___________________________________________________
>
> .> Bioconductor mailing list
> .> Bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
>
> .>
> https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
>
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>
> .> Search the archives:
> http://news.gmane.org/gmane.____science.biology.informatics.
____conductor
> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">
>
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>
> .
>
.___________________________________________________
>
> .Bioconductor mailing list
> .Bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
>
>
> .https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
>
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>
> .Search the archives:
> http://news.gmane.org/gmane.____science.biology.informatics.
____conductor
> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">
>
>
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
This is basically an argument against incorporating range-based
semantics
into the R vector API. I always thought it was interesting/cool how
IRanges
considered ranges to be a special data type, with special semantics.
The
%in% operator in particular has many fans. But it's hard to argue
against
consistency with the base R behavior. That point is not lost on me and
it
drove the design of DataFrame, Rle, etc.
I'm still not sure we even need the findMatches function. There are
very
few times I've used outer(x, y, "=="). The feature request (and it was
a
good one) was for tabulating ranges. At some point after so many years
one
has to acknowledge that the IRanges API has been empirically shown to
be
reasonable, despite its theoretical inconsistencies. This is why I am
resistant to such changes. But maybe I'm just suffering from my own
personal biases.
One other point: most of the code using IRanges is in scripts outside
of
the Bioc repository, so it is easy to underestimate the significance
of
some changes.
Michael
On Mon, Jan 7, 2013 at 1:46 PM, Hervé Pagès <hpages@fhcrc.org> wrote:
> On 01/07/2013 11:33 AM, Michael Lawrence wrote:
>
>>
>>
>>
>> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès <hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>> wrote:
>>
>> Hi Michael,
>>
>> I don't think "match" (the word) always has to mean "equality"
either.
>> However having match() (the function) do "whole exact matching"
(aka
>> "equality") for any kind of vector-like object has the
advantage of:
>>
>> (a) making it consistent with base::match() (?base::match is
pretty
>> explicit about what the contract of match() is)
>>
>>
>> (a) alone is obviously not enough. We have many methods, like the
set
>> operations, that treat ranges specially. Are we going to start
moving
>> everything toward the base behavior? And have rangeIntersect,
>> rangeSetdiff, etc?
>>
>> (b) preserving its relationship with ==, duplicated(),
unique(),
>> etc...
>>
>>
>> So it becomes consistent with duplicated/unique, but we lose
consistency
>> with the set operations.
>>
>
> Nope, we don't loose anything. Because match()/%in% were NOT
consistent
> with the set operations anyway, that is, 'intersect(x, y)' on
> IRanges/GRanges objects was not doing 'x[x %in% y]' (%in% here being
> the old %in%).
>
>
>
>> (c) not frustrating the user who needs something to do exact
>> matching on ranges (as I mentioned previously, if you
take
>> match() away from him/her, s/he'll be left with
nothing).
>>
>>
>> No one has ever asked for match() to behave this way.
>>
>
> Here is my use case: internally findMatches()/countMatches() are
> implemented on top of match(), the fixed match(). They work on any
> object for which match() works. They would also work on objects for
> which match() does the wrong thing but they would return something
> wrong. They could be made ordinary functions, not generic (and they
> will, but they temporarily need to be made generics with methods,
> just to smooth the transition), because dispatch happens inside the
> function when match() is called. In the man page for those functions
> I can just say:
>
> findMatches(x, table): An enhanced version of match that returns
>
> all the matches in a Hits object.
>
> and I'm done. It's clear and concise.
>
> The implementation/documentation of findMatches()/countMatches() is
> the typical illustration of why having methods that respect the
> contract of the generic is a must.
>
> The idea is to build on top of some basic building-blocks for which
> the behavior is well-defined, consistent, predictable. It's sooo
much
> easier, and it's very healthy.
>
>
> There was a
>> request for a way to tabulate identical ranges. It was a nice idea
to
>> extract the general "outer equal" findMatches function.
>>
>
> It's also a nice idea to have findMatches() and countMatches()
aligned
> with match().
>
>
> But the changes seem to be snow-balling.
>>
>
> No snow-balling. You cannot snow-ball too far anyway when you
restore
> consistency. But you can easily snow-ball very far when you go on
the
> other direction (there is no limits). Do I need to say that aiming
for
> consistency/predictability is a good goal in software design? It can
> only make it *better* in all the meanings of the term: less bugs,
> easier to maintain, easier to document, and easier to use in the
long
> run. Everybody wins. Even if you don't realize it now. Convenience
is
> also important, but less important than consistency/predictability.
> As a matter of fact, an interesting and not immediately obvious side
> effect of going consistent is that, in the long run (i.e. when the
> software becomes bigger and more complex), it also gives you a form
of
> convenience for the end-user: documentation is simpler and easier to
> read, and there are less special cases to remember.
>
>
> These types of changes mean a lot of
>> maintenance work for the users. A deprecation cycle does not
circumvent
>> that.
>>
>
> I don't see why this change would be more work for the users than
any
> other change. Making RangedData fade away will certainly be a much
> bigger one, will take much more time (maybe 2-3 years), and will
> require a lot more maintenance work from us (mostly me) and from
> the users.
>
> FWIW, the change to match()/%in% probably means more work for me
than
> for the users. There is a *lot* of stuff I had to put in place in
> IRanges/GenomicRanges to make this transition smooth. But I truly
> believe it was worth it. I also fixed all the BioC packages I found
> that were affected by those changes (surprisingly, there were very
> few: only 5). I could have missed some. Please let me know if that
> is the case and I'll fix them too.
>
> Thanks,
> H.
>
>
>>
>> IMO those advantages counterbalance *by far* the very little
>> convenience you get from having 'match(query, subject)' do
>> 'findOverlaps(query, subject, select="first")' on
>> IRanges/GRanges objects. If you need to do that, just use the
>> latter, or, if you think that's still too much typing, define
>> a wrapper e.g. 'ovmatch(query, subject)'.
>>
>> There are plenty of specialized tools around for doing
>> inexact/fuzzy/partial/overlap matching for many particular
types
>> of vector-like objects: grep() and family, pmatch(),
charmatch(),
>> agrep(), grepRaw(), matchPattern() and family, findOverlaps()
and
>> family, findIntervals(), etc... For the reasons I mentioned
>> above, none of them should hijack match() to make it do some
>> particular type of inexact matching on some particular type of
>> objects. Even if, for that particular type of objects, doing
that
>> particular type of inexact matching is more common than doing
>> exact matching.
>>
>> H.
>>
>>
>>
>> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>>
>> I think having overlapsAny is a nice addition and helps
make the
>> API
>> more complete and explicit. Are you sure we need to change
the
>> behavior
>> of the match method for this relatively uncommon use case?
>>
>>
>> Yes because otherwise users with a use case of doing match()
>>
>> even if it's uncommon,
>>
>>
>> I don't think
>> "match" always has to mean "equality". It is a more general
>> concept in
>> my mind. The most common use case for matching ranges is
overlap.
>>
>>
>> Of course "match" doesn't always have to mean equality. But of
base
>>
>>
>> Michael
>>
>>
>> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
<hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>> wrote:
>>
>> Yes 'peaks %in% genes' is cute and was probably doing
the
>> right thing
>> for most users (although not all). But 'exons %in%
genes'
>> is cute too
>> and was probably doing the wrong thing for all users.
>> Advanced users
>> like you guys would have no problem switching to
>>
>> !is.na <http: is.na="">
>> <http: is.na="">(findOverlaps(__**peaks, genes,
type="within",
>>
>> select="any"))
>>
>> or
>>
>> !is.na <http: is.na="">
>> <http: is.na="">(findOverlaps(__**peaks, genes, type="equal",
>>
>>
>> select="any"))
>>
>> in case 'peaks %in% genes' was not doing exactly what
you
>> wanted,
>> but most users would not find this particularly
friendly.
>> Even
>> worse, some users probably didn't realize that 'peaks
%in%
>> genes'
>> was not doing exactly what they thought it did because
>> "peaks in
>> genes" in English suggests that the peaks are within
the
>> genes,
>> but it's not what 'peaks %in% genes' does.
>>
>> Having overlapsAny(), with exactly the same extra
arguments
>> as
>> countOverlaps() and subsetByOverlaps() (i.e. 'maxgap',
>> 'minoverlap',
>> 'type', 'ignore.strand'), all of them documented (and
with
>> most
>> users more or less familiar with them already) has the
>> virtue to
>> expose the user to all the options from the very
start, and
>> to
>> help him/her make the right choice. Of course there
will be
>> users
>> that don't want or don't have the time to read/think
about
>> all the
>> options. Not a big deal: they'll just do
>> 'overlapsAny(query, subject)',
>> which is not a lot more typing than 'query %in%
subject',
>> especially
>> if they use tab completion.
>>
>> It's true that it's more common to ask questions about
>> overlap than
>> about equality but there are some use cases for the
latter
>> (as the
>> original thread shows). Until now, when you had such a
use
>> case, you
>> could not use match() or %in%, which would have been
the
>> natural things
>> to use, because they got hijacked to do something
else, and
>> you were
>> left with nothing. Not a satisfying situation. So at a
>> minimum, we
>> needed to restore the true/real/original semantic of
>> match() to do
>> "equality" instead of "overlap". But it's hard to do
this
>> for match()
>> and not do it for %in% too. For more than 99% of R
users,
>> %in% is
>> just a simple wrapper for 'match(x, table, nomatch =
0) >
>> 0' (this
>> is how it has been documented and implemented in base
R for
>> many
>> years). Not maintaining this relationship between %in%
and
>> match()
>> would only cause grief and frustration to newcomers to
>> Bioconductor.
>>
>> H.
>>
>>
>>
>> On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
>>
>> Hiya again,
>>
>> I am definitely a late comer to BioC, so I
definitely
>> easily
>> defer to
>> the tide of history.
>>
>> But I do think you miss my point Michael about the
>> proposed change
>> making the relationship between %in% and match for
>> {G,I}Ranges{List}
>> mimic that between other vectors, and I do think
that
>> changing
>> the API
>> would make other late-comers take to BioC
easier/faster.
>>
>> That said, I NEVER use %in% so I really have no
stake
>> in the
>> matter, and
>> I DEFINITELY appreciate the argument to not
changing
>> the API
>> just for
>> sematic sweetness.
>>
>> That that said, Herve is _/so good/_ about
deprecations
>> and warnings
>>
>> that make such changes fairly easily digestible.
>>
>> That that that.... enough.... I bow out of this
>> one....!!!!
>>
>> Always learning and Happy New Year to all lurkers,
>>
>> ~Malcolm
>>
>> *From:*Michael Lawrence
[mailto:lawrence.michael@gene.
>> <mailto:lawrence.michael@gene.**>____com
>>
>>
>> <mailto:lawrence.michael@gene.**__com>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">
>> >>]
>> *Sent:* Friday, January 04, 2013 5:11 PM
>> *To:* Cook, Malcolm
>> *Cc:* Sean Davis; Michael Lawrence; Hervé Pagès
>> (hpages@fhcrc.org <mailto:hpages@fhcrc.org>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>); Tim
>>
>>
>>
>> Triche, Jr.; Vedran Franke;
bioconductor@r-project.org
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>
>> *Subject:* Re: [BioC] countMatches() (was: table
for
>> GenomicRanges)
>>
>>
>> On Fri, Jan 4, 2013 at 1:56 PM, Cook, Malcolm
>> <mec@stowers.org <mailto:mec@stowers.org="">
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">
>> <mailto:mec@stowers.org <mailto:mec@stowers.org="">>>> wrote:
>>
>> Hiya,
>>
>> For what it is worth...
>>
>> I think the change to %in% is warranted.
>>
>> If I understand correctly, this change restores
the
>> relationship
>> between
>> the semantics of `%in` and the semantics of
`match`.
>>
>> From the docs:
>>
>> '"%in%" <- function(x, table) match(x, table,
>> nomatch = 0) > 0'
>>
>> Herve's change restores this relationship.
>>
>>
>> match and %in% were initially consistent (both
>> considering any
>> overlap);
>> Herve has changed both of them together. The whole
idea
>> behind
>> IRanges
>> is that ranges are special data types with special
>> semantics. We
>> have
>> reimplemented much of the existing R vector API
using
>> those
>> semantics;
>> this extends beyond match/%in%. I am hesitant
about
>> making such
>> sweeping
>> changes to the API so late in the life-cycle of
the
>> package.
>> There was a
>> feature request for a way to count identical
ranges in
>> a set of
>> ranges.
>> Let's please not get carried away and start
redesigning
>> the API
>> for this
>> one, albeit useful, request. There are all sorts
of
>> inconsistencies in
>> the API, and many of them were conscious decisions
that
>> considered
>> practical use cases.
>>
>> Michael
>>
>>
>> Herve, I suspect you were you as a result
able to
>> completely drop
>> all the `%in%,BiocClass1,BiocClass2`
definitions
>> and depend
>> upon
>> base::%in%
>>
>> Am I right?
>>
>> If so, may I suggest that Herve stay the
course,
>> with the
>> addition of
>> '"%ol%" <- function(a, b) findOverlaps(a,
b,
>> maxgap=0L,
>> minoverlap=1L, type='any', select='all') > 0'
>>
>> This would provide a perspicacious idiom,
thereby
>> optimizing the API
>> for Michaels observed common use case.
>>
>> Just sayin'
>>
>> ~Malcolm
>>
>>
>> .-----Original Message-----
>> .From: bioconductor-
bounces@r-____**project.org<bioconductor-bounces@r-____project.org>
>> <mailto:bioconductor-bounces@**r-__project.org <bioconductor-bounces@r-__project.org="">
>> >
>>
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>> >>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>**____r-project.org
>> <http: r-project.org="">
>>
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>> >>>
>> [mailto:bioconductor-bounces@
>> <mailto:bioconductor-bounces@>**____r-project.org
>> <http: r-project.org="">
>>
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">
>> >>
>>
>> <mailto:bioconductor-bounces@>> <mailto:bioconductor-bounces@>**____r-project.org
>> <http: r-project.org="">
>>
>>
>> <mailto:bioconductor-bounces@_**_r-project.org>> <mailto:bioconductor-bounces@**r-project.org<bioconductor- bounces@r-project.org="">>>>]
>> On Behalf Of Sean
>> Davis
>> .Sent: Friday, January 04, 2013 3:37 PM
>> .To: Michael Lawrence
>> .Cc: Tim Triche, Jr.; Vedran Franke;
>> bioconductor@r-project.org
<mailto:bioconductor@r-**project.org<bioconductor@r-project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>
>> <mailto:bioconductor@r-____**project.org<bioc onductor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
>>
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>>
>>
>> .Subject: Re: [BioC] countMatches() (was:
table
>> for
>> GenomicRanges)
>> .
>> .On Fri, Jan 4, 2013 at 4:32 PM, Michael
Lawrence
>> .<lawrence.michael@gene.com>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>
>> <mailto:lawrence.michael@gene.**__com>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>>
>> <mailto:lawrence.michael@gene.>> <mailto:lawrence.michael@gene.**>____com
>>
>>
>> <mailto:lawrence.michael@gene.**__com>> <mailto:lawrence.michael@gene.**com <lawrence.michael@gene.com="">>>>>
>> wrote:
>> .> The change to the behavior of %in% is a
>> pretty big
>> one. Are you
>> thinking
>> .> that all set-based operations should
behave
>> this way? For
>> example, setdiff
>> .> and intersect? I really liked the syntax
of
>> "peaks
>> %in% genes".
>> In my
>> .> experience, it's way more common to ask
>> questions
>> about overlap
>> than about
>> .> equality, so I'd rather optimize the API
for
>> that use
>> case. But
>> again,
>> .> that's just my personal bias.
>> .
>> .For what it is worth, I share Michael's
>> personal bias here.
>> .
>> .Sean
>> .
>> .
>> .> Michael
>> .>
>> .>
>> .> On Fri, Jan 4, 2013 at 1:11 PM, Hervé
Pagès
>> <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>>
wrote:
>> .>
>> .>> Hi,
>> .>>
>> .>> I added findMatches() and
countMatches() to
>> the
>> latest IRanges /
>> .>> GenomicRanges packages (in BioC devel
only).
>> .>>
>> .>> findMatches(x, table): An enhanced
version
>> of
>> match that
>> .>> returns all the matches in a
Hits
>> object.
>> .>>
>> .>> countMatches(x, table): Returns an
integer
>> vector
>> of the length
>> .>> of x, containing the number
of
>> matches in
>> table for
>> .>> each element in x.
>> .>>
>>
>> .>> countMatches() is what you can use to
>> tally/count/tabulate
>> (choose your
>>
>> .>> preferred term) the unique elements in
a
>> GRanges object:
>> .>>
>> .>> library(GenomicRanges)
>> .>> set.seed(33)
>> .>> gr <- GRanges("chr1",
>> IRanges(sample(15,20,replace=***____*TRUE),
>>
>>
>> width=5))
>> .>>
>> .>> Then:
>> .>>
>> .>> > gr_levels <- sort(unique(gr))
>> .>> > countMatches(gr_levels, gr)
>> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
>> .>>
>> .>> Note that findMatches() and
countMatches()
>> also work on
>> IRanges and
>> .>> DNAStringSet objects, as well as on
ordinary
>> atomic
>> vectors:
>> .>>
>> .>> library(hgu95av2probe)
>> .>> library(Biostrings)
>> .>> probes <- DNAStringSet(hgu95av2probe)
>> .>> unique_probes <- unique(probes)
>> .>> count <- countMatches(unique_probes,
probes)
>> .>> max(count) # 7
>> .>>
>> .>> I made other changes in
>> IRanges/GenomicRanges so that
>> the notion
>> .>> of "match" between elements of a
vector-like
>> object now
>> consistently
>> .>> means "equality" instead of "overlap",
even
>> for
>> range-based
>> objects
>> .>> like IRanges or GRanges objects. This
notion
>> of
>> "equality" is the
>> .>> same that is used by ==. The most
visible
>> consequence
>> of those
>> .>> changes is that using %in% between 2
IRanges
>> or
>> GRanges objects
>> .>> 'query' and 'subject' in order to do
>> overlaps was
>> replaced by
>> .>> overlapsAny(query, subject).
>> .>>
>> .>> overlapsAny(query, subject): Finds
the
>> ranges in
>> query that
>> .>> overlap any of the ranges in
subject.
>> .>>
>>
>> .>> There are warnings and deprecation
messages
>> in place
>> to help
>> smooth
>>
>> .>> the transition.
>> .>>
>> .>> Cheers,
>> .>> H.
>> .>>
>> .>> --
>> .>> Hervé Pagès
>> .>>
>> .>> Program in Computational Biology
>> .>> Division of Public Health Sciences
>> .>> Fred Hutchinson Cancer Research Center
>> .>> 1100 Fairview Ave. N, M1-B514
>> .>> P.O. Box 19024
>> .>> Seattle, WA 98109-1024
>> .>>
>> .>> E-mail: hpages@fhcrc.org
>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">
>>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>
>>
>> .>> Phone: (206) 667-5791
>> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
>> <tel:%28206%29%20667-5791>
>> .>> Fax: (206) 667-1319
>> <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319>
>> <tel:%28206%29%20667-1319>
>>
>> .>>
>> .>
>> .> [[alternative HTML version
deleted]]
>> .>
>> .>
>> .>
>> ______________________________**_____________________
>>
>>
>> .> Bioconductor mailing list
>> .> Bioconductor@r-project.org
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>
>> <mailto:bioconductor@r-____**project.org<bioconduc tor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>>
>>
>> .>
>> https://stat.ethz.ch/mailman/_**___listinfo/bioconductor<ht tps:="" stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<htt="" ps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> >
>>
>>
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" biocond="" uctor<https:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https="" :="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>> >>
>> .> Search the archives:
>>
http://news.gmane.org/gmane.__**__science.biology.informatics.**
>> ____conductor<http: news.gmane.org="" gmane.____science.biology.infor="" matics.____conductor="">
>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">> _conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor="">
>> >
>>
>>
>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">> _conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor="">
>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor="">
>> >>
>> .
>> ._____________________________**
>> ______________________
>>
>>
>> .Bioconductor mailing list
>> .Bioconductor@r-project.org
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>
>> <mailto:bioconductor@r-____**project.org<bioconduc tor@r-____project.org="">
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> >
>>
<mailto:bioconductor@r-__**project.org<bioconductor@r-__project.org>
>> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">
>> >>>
>>
>>
>> .https://stat.ethz.ch/mailman/**____listinfo/bioconductor
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" bioconductor<htt="" ps:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> >
>>
>>
>> <https: stat.ethz.ch="" mailman="" **__listinfo="" biocond="" uctor<https:="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https="" :="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
>> >>
>> .Search the archives:
>>
http://news.gmane.org/gmane.__**__science.biology.informatics.**
>> ____conductor<http: news.gmane.org="" gmane.____science.biology.infor="" matics.____conductor="">
>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">> _conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor="">
>> >
>>
>>
>>
>>
<http: news.gmane.org="" gmane._**_science.biology.informatics._**="">> _conductor<http: news.gmane.org="" gmane.__science.biology.informatic="" s.__conductor="">
>>
<http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor="">
>> >>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org>
>> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>
>>
>>
>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>> <tel:%28206%29%20667-5791>
>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>> <tel:%28206%29%20667-1319>
>>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org>
>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages@fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
[[alternative HTML version deleted]]
On 01/07/2013 04:20 PM, Michael Lawrence wrote:
> This is basically an argument against incorporating range-based
> semantics into the R vector API. I always thought it was
> interesting/cool how IRanges considered ranges to be a special data
> type, with special semantics. The %in% operator in particular has
many
> fans. But it's hard to argue against consistency with the base R
> behavior. That point is not lost on me and it drove the design of
> DataFrame, Rle, etc.
>
> I'm still not sure we even need the findMatches function. There are
very
> few times I've used outer(x, y, "=="). The feature request (and it
was a
> good one) was for tabulating ranges.
which you can do with countMatches(). I've put findMatches() for
completeness (as the natural companion of countMatches()), and I'm not
charging extra money for this. So we have a nice parallel between
findMatches()/countMatches() on one side (for doing exact match), and
findOverlaps()/countOverlaps() on the other side (for doing overlaps).
> At some point after so many years
> one has to acknowledge that the IRanges API has been empirically
shown
> to be reasonable, despite its theoretical inconsistencies. This is
why I
> am resistant to such changes. But maybe I'm just suffering from my
own
> personal biases.
>
> One other point: most of the code using IRanges is in scripts
outside of
> the Bioc repository, so it is easy to underestimate the significance
of
> some changes.
or to overestimate it? Is it unreasonable to assume that level of
usage
in the Bioc repository reflects the amount of usage outside of it? And
I forgot to mention that, in addition to having only 5 packages to fix
in the repo, fixing them couldn't have been easier.
H.
>
> Michael
>
>
>
>
>
> On Mon, Jan 7, 2013 at 1:46 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote:
>
> On 01/07/2013 11:33 AM, Michael Lawrence wrote:
>
>
>
>
> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
wrote:
>
> Hi Michael,
>
> I don't think "match" (the word) always has to mean
> "equality" either.
> However having match() (the function) do "whole exact
> matching" (aka
> "equality") for any kind of vector-like object has the
> advantage of:
>
> (a) making it consistent with base::match()
> (?base::match is pretty
> explicit about what the contract of match() is)
>
>
> (a) alone is obviously not enough. We have many methods,
like
> the set
> operations, that treat ranges specially. Are we going to
start
> moving
> everything toward the base behavior? And have
rangeIntersect,
> rangeSetdiff, etc?
>
> (b) preserving its relationship with ==,
duplicated(),
> unique(),
> etc...
>
>
> So it becomes consistent with duplicated/unique, but we lose
> consistency
> with the set operations.
>
>
> Nope, we don't loose anything. Because match()/%in% were NOT
consistent
> with the set operations anyway, that is, 'intersect(x, y)' on
> IRanges/GRanges objects was not doing 'x[x %in% y]' (%in% here
being
> the old %in%).
>
>
>
> (c) not frustrating the user who needs something to
do exact
> matching on ranges (as I mentioned previously,
if
> you take
> match() away from him/her, s/he'll be left with
> nothing).
>
>
> No one has ever asked for match() to behave this way.
>
>
> Here is my use case: internally findMatches()/countMatches() are
> implemented on top of match(), the fixed match(). They work on
any
> object for which match() works. They would also work on objects
for
> which match() does the wrong thing but they would return
something
> wrong. They could be made ordinary functions, not generic (and
they
> will, but they temporarily need to be made generics with
methods,
> just to smooth the transition), because dispatch happens inside
the
> function when match() is called. In the man page for those
functions
> I can just say:
>
> findMatches(x, table): An enhanced version of ?match? that
returns
>
> all the matches in a Hits object.
>
> and I'm done. It's clear and concise.
>
> The implementation/documentation of findMatches()/countMatches()
is
> the typical illustration of why having methods that respect the
> contract of the generic is a must.
>
> The idea is to build on top of some basic building-blocks for
which
> the behavior is well-defined, consistent, predictable. It's sooo
much
> easier, and it's very healthy.
>
>
> There was a
> request for a way to tabulate identical ranges. It was a
nice
> idea to
> extract the general "outer equal" findMatches function.
>
>
> It's also a nice idea to have findMatches() and countMatches()
aligned
> with match().
>
>
> But the changes seem to be snow-balling.
>
>
> No snow-balling. You cannot snow-ball too far anyway when you
restore
> consistency. But you can easily snow-ball very far when you go
on the
> other direction (there is no limits). Do I need to say that
aiming for
> consistency/predictability is a good goal in software design? It
can
> only make it *better* in all the meanings of the term: less
bugs,
> easier to maintain, easier to document, and easier to use in the
long
> run. Everybody wins. Even if you don't realize it now.
Convenience is
> also important, but less important than
consistency/predictability.
> As a matter of fact, an interesting and not immediately obvious
side
> effect of going consistent is that, in the long run (i.e. when
the
> software becomes bigger and more complex), it also gives you a
form of
> convenience for the end-user: documentation is simpler and
easier to
> read, and there are less special cases to remember.
>
>
> These types of changes mean a lot of
> maintenance work for the users. A deprecation cycle does not
> circumvent
> that.
>
>
> I don't see why this change would be more work for the users
than any
> other change. Making RangedData fade away will certainly be a
much
> bigger one, will take much more time (maybe 2-3 years), and will
> require a lot more maintenance work from us (mostly me) and from
> the users.
>
> FWIW, the change to match()/%in% probably means more work for
me than
> for the users. There is a *lot* of stuff I had to put in place
in
> IRanges/GenomicRanges to make this transition smooth. But I
truly
> believe it was worth it. I also fixed all the BioC packages I
found
> that were affected by those changes (surprisingly, there were
very
> few: only 5). I could have missed some. Please let me know if
that
> is the case and I'll fix them too.
>
> Thanks,
> H.
>
>
>
> IMO those advantages counterbalance *by far* the very
little
> convenience you get from having 'match(query, subject)'
do
> 'findOverlaps(query, subject, select="first")' on
> IRanges/GRanges objects. If you need to do that, just
use the
> latter, or, if you think that's still too much typing,
define
> a wrapper e.g. 'ovmatch(query, subject)'.
>
> There are plenty of specialized tools around for doing
> inexact/fuzzy/partial/overlap matching for many
particular
> types
> of vector-like objects: grep() and family, pmatch(),
> charmatch(),
> agrep(), grepRaw(), matchPattern() and family,
> findOverlaps() and
> family, findIntervals(), etc... For the reasons I
mentioned
> above, none of them should hijack match() to make it do
some
> particular type of inexact matching on some particular
type of
> objects. Even if, for that particular type of objects,
> doing that
> particular type of inexact matching is more common than
doing
> exact matching.
>
> H.
>
>
>
> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>
> I think having overlapsAny is a nice addition and
helps
> make the API
> more complete and explicit. Are you sure we need to
> change the
> behavior
> of the match method for this relatively uncommon
use case?
>
>
> Yes because otherwise users with a use case of doing
match()
>
> even if it's uncommon,
>
>
> I don't think
> "match" always has to mean "equality". It is a more
general
> concept in
> my mind. The most common use case for matching
ranges
> is overlap.
>
>
> Of course "match" doesn't always have to mean equality.
But
> of base
>
>
> Michael
>
>
> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>
wrote:
>
> Yes 'peaks %in% genes' is cute and was
probably
> doing the
> right thing
> for most users (although not all). But 'exons
%in%
> genes'
> is cute too
> and was probably doing the wrong thing for
all users.
> Advanced users
> like you guys would have no problem switching
to
>
> !is.na <http: is.na=""> <http: is.na="">
> <http: is.na="">(findOverlaps(____peaks, genes,
> type="within",
>
> select="any"))
>
> or
>
> !is.na <http: is.na=""> <http: is.na="">
> <http: is.na="">(findOverlaps(____peaks, genes,
type="equal",
>
>
> select="any"))
>
> in case 'peaks %in% genes' was not doing
exactly
> what you
> wanted,
> but most users would not find this
particularly
> friendly. Even
> worse, some users probably didn't realize that
> 'peaks %in%
> genes'
> was not doing exactly what they thought it did
because
> "peaks in
> genes" in English suggests that the peaks are
> within the genes,
> but it's not what 'peaks %in% genes' does.
>
> Having overlapsAny(), with exactly the same
extra
> arguments as
> countOverlaps() and subsetByOverlaps() (i.e.
'maxgap',
> 'minoverlap',
> 'type', 'ignore.strand'), all of them
documented
> (and with most
> users more or less familiar with them already)
has the
> virtue to
> expose the user to all the options from the
very
> start, and to
> help him/her make the right choice. Of course
> there will be
> users
> that don't want or don't have the time to
> read/think about
> all the
> options. Not a big deal: they'll just do
> 'overlapsAny(query, subject)',
> which is not a lot more typing than 'query
%in%
> subject',
> especially
> if they use tab completion.
>
> It's true that it's more common to ask
questions about
> overlap than
> about equality but there are some use cases
for
> the latter
> (as the
> original thread shows). Until now, when you
had
> such a use
> case, you
> could not use match() or %in%, which would
have
> been the
> natural things
> to use, because they got hijacked to do
something
> else, and
> you were
> left with nothing. Not a satisfying situation.
So at a
> minimum, we
> needed to restore the true/real/original
semantic of
> match() to do
> "equality" instead of "overlap". But it's hard
to
> do this
> for match()
> and not do it for %in% too. For more than 99%
of R
> users,
> %in% is
> just a simple wrapper for 'match(x, table,
nomatch
> = 0) >
> 0' (this
> is how it has been documented and implemented
in
> base R for
> many
> years). Not maintaining this relationship
between
> %in% and
> match()
> would only cause grief and frustration to
newcomers to
> Bioconductor.
>
> H.
>
>
>
> On 01/04/2013 03:32 PM, Cook, Malcolm wrote:
>
> Hiya again,
>
> I am definitely a late comer to BioC, so I
> definitely
> easily
> defer to
> the tide of history.
>
> But I do think you miss my point Michael
about the
> proposed change
> making the relationship between %in% and
match for
> {G,I}Ranges{List}
> mimic that between other vectors, and I do
> think that
> changing
> the API
> would make other late-comers take to BioC
> easier/faster.
>
> That said, I NEVER use %in% so I really
have
> no stake
> in the
> matter, and
> I DEFINITELY appreciate the argument to
not
> changing
> the API
> just for
> sematic sweetness.
>
> That that said, Herve is _/so good/_ about
> deprecations
> and warnings
>
> that make such changes fairly easily
digestible.
>
> That that that.... enough.... I bow out of
> this one....!!!!
>
> Always learning and Happy New Year to all
lurkers,
>
> ~Malcolm
>
> *From:*Michael Lawrence
> [mailto:lawrence.michael at gene <mailto:lawrence.michael at="" gene="">.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.__>____com
>
>
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">____com
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>>]
> *Sent:* Friday, January 04, 2013 5:11 PM
> *To:* Cook, Malcolm
> *Cc:* Sean Davis; Michael Lawrence; Hervé
Pagès
> (hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>);
Tim
>
>
>
> Triche, Jr.; Vedran Franke;
> bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
>
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
> *Subject:* Re: [BioC] countMatches() (was:
> table for
> GenomicRanges)
>
>
> On Fri, Jan 4, 2013 at 1:56 PM, Cook,
Malcolm
> <mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>
> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">>>
> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org=""> <mailto:mec at="" stowers.org="">>
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">
> <mailto:mec at="" stowers.org="" <mailto:mec="" at="" stowers.org="">>>>>
wrote:
>
> Hiya,
>
> For what it is worth...
>
> I think the change to %in% is warranted.
>
> If I understand correctly, this change
> restores the
> relationship
> between
> the semantics of `%in` and the semantics
of
> `match`.
>
> From the docs:
>
> '"%in%" <- function(x, table) match(x,
table,
> nomatch = 0) > 0'
>
> Herve's change restores this relationship.
>
>
> match and %in% were initially consistent
(both
> considering any
> overlap);
> Herve has changed both of them together.
The
> whole idea
> behind
> IRanges
> is that ranges are special data types with
special
> semantics. We
> have
> reimplemented much of the existing R
vector
> API using those
> semantics;
> this extends beyond match/%in%. I am
hesitant
> about
> making such
> sweeping
> changes to the API so late in the life-
cycle
> of the
> package.
> There was a
> feature request for a way to count
identical
> ranges in
> a set of
> ranges.
> Let's please not get carried away and
start
> redesigning
> the API
> for this
> one, albeit useful, request. There are all
> sorts of
> inconsistencies in
> the API, and many of them were conscious
> decisions that
> considered
> practical use cases.
>
> Michael
>
>
> Herve, I suspect you were you as a
result
> able to
> completely drop
> all the `%in%,BiocClass1,BiocClass2`
> definitions
> and depend
> upon
> base::%in%
>
> Am I right?
>
> If so, may I suggest that Herve stay
the
> course,
> with the
> addition of
> '"%ol%" <- function(a, b)
> findOverlaps(a, b,
> maxgap=0L,
> minoverlap=1L, type='any',
select='all') > 0'
>
> This would provide a perspicacious
idiom,
> thereby
> optimizing the API
> for Michaels observed common use
case.
>
> Just sayin'
>
> ~Malcolm
>
>
> .-----Original Message-----
> .From:
> bioconductor-bounces at r-______project.org
> <mailto:bioconductor-bounces at="" r-____project.org="">
> <mailto:bioconductor-bounces at="" __r-__project.org=""> <mailto:bioconductor-bounces at="" r-__project.org="">>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>>
> [mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>>______r-project.org
> <http: r-project.org="">
> <http: r-project.org="">
>
>
> <mailto:bioconductor-bounces@> <mailto:bioconductor-bounces@>____r-project.org
> <http: r-project.org="">
> <mailto:bioconductor-bounces at="" __r-project.org=""> <mailto:bioconductor-bounces at="" r-project.org="">>>>] On Behalf
Of Sean
> Davis
> .Sent: Friday, January 04, 2013
3:37 PM
> .To: Michael Lawrence
> .Cc: Tim Triche, Jr.; Vedran
Franke;
> bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
> <mailto:bioconductor at="" r-______project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">>
>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>>
>
> .Subject: Re: [BioC] countMatches()
> (was: table for
> GenomicRanges)
> .
> .On Fri, Jan 4, 2013 at 4:32 PM,
> Michael Lawrence
> .<lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">____com
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>>
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.
> <mailto:lawrence.michael at="" gene=""> <mailto:lawrence.michael at="" gene="">.__>____com
>
>
> <mailto:lawrence.michael at="" gene.=""> <mailto:lawrence.michael at="" gene.="">____com
> <mailto:lawrence.michael at="" gene.__com=""> <mailto:lawrence.michael at="" gene.com="">>>>> wrote:
> .> The change to the behavior of
%in% is a
> pretty big
> one. Are you
> thinking
> .> that all set-based operations
should
> behave
> this way? For
> example, setdiff
> .> and intersect? I really liked
the
> syntax of
> "peaks
> %in% genes".
> In my
> .> experience, it's way more common
to
> ask questions
> about overlap
> than about
> .> equality, so I'd rather optimize
the
> API for
> that use
> case. But
> again,
> .> that's just my personal bias.
> .
> .For what it is worth, I share
Michael's
> personal bias here.
> .
> .Sean
> .
> .
> .> Michael
> .>
> .>
> .> On Fri, Jan 4, 2013 at 1:11 PM,
> Hervé Pagès
> <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>>
wrote:
> .>
> .>> Hi,
> .>>
> .>> I added findMatches() and
> countMatches() to the
> latest IRanges /
> .>> GenomicRanges packages (in BioC
> devel only).
> .>>
> .>> findMatches(x, table): An
> enhanced version of
> ?match? that
> .>> returns all the
matches
> in a Hits
> object.
> .>>
> .>> countMatches(x, table):
Returns
> an integer
> vector
> of the length
> .>> of ?x?, containing
the
> number of
> matches in
> ?table? for
> .>> each element in ?x?.
> .>>
>
> .>> countMatches() is what you can
use to
> tally/count/tabulate
> (choose your
>
> .>> preferred term) the unique
elements
> in a
> GRanges object:
> .>>
> .>> library(GenomicRanges)
> .>> set.seed(33)
> .>> gr <- GRanges("chr1",
>
IRanges(sample(15,20,replace=*______*TRUE),
>
>
> width=5))
> .>>
> .>> Then:
> .>>
> .>> > gr_levels <-
sort(unique(gr))
> .>> > countMatches(gr_levels, gr)
> .>> [1] 1 1 1 2 4 2 2 1 2 2 2
> .>>
> .>> Note that findMatches() and
> countMatches()
> also work on
> IRanges and
> .>> DNAStringSet objects, as well
as on
> ordinary
> atomic
> vectors:
> .>>
> .>> library(hgu95av2probe)
> .>> library(Biostrings)
> .>> probes <-
DNAStringSet(hgu95av2probe)
> .>> unique_probes <-
unique(probes)
> .>> count <-
> countMatches(unique_probes, probes)
> .>> max(count) # 7
> .>>
> .>> I made other changes in
> IRanges/GenomicRanges so that
> the notion
> .>> of "match" between elements of
a
> vector-like
> object now
> consistently
> .>> means "equality" instead of
> "overlap", even for
> range-based
> objects
> .>> like IRanges or GRanges
objects.
> This notion of
> "equality" is the
> .>> same that is used by ==. The
most
> visible
> consequence
> of those
> .>> changes is that using %in%
between
> 2 IRanges or
> GRanges objects
> .>> 'query' and 'subject' in order
to do
> overlaps was
> replaced by
> .>> overlapsAny(query, subject).
> .>>
> .>> overlapsAny(query, subject):
> Finds the
> ranges in
> ?query? that
> .>> overlap any of the ranges
in
> ?subject?.
> .>>
>
> .>> There are warnings and
deprecation
> messages
> in place
> to help
> smooth
>
> .>> the transition.
> .>>
> .>> Cheers,
> .>> H.
> .>>
> .>> --
> .>> Hervé Pagès
> .>>
> .>> Program in Computational
Biology
> .>> Division of Public Health
Sciences
> .>> Fred Hutchinson Cancer Research
Center
> .>> 1100 Fairview Ave. N, M1-B514
> .>> P.O. Box 19024
> .>> Seattle, WA 98109-1024
> .>>
> .>> E-mail: hpages at fhcrc.org
> <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>
>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>>
>
> .>> Phone: (206) 667-5791
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> .>> Fax: (206) 667-1319
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
> .>>
> .>
> .> [[alternative HTML
version
> deleted]]
> .>
> .>
> .>
>
_____________________________________________________
>
>
> .> Bioconductor mailing list
> .> Bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
> <mailto:bioconductor at="" r-______project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">>
>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>>
>
> .>
> https://stat.ethz.ch/mailman/______listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
>
>
>
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
> .> Search the archives:
> http://news.gmane.org/gmane.______science.biology.informatic
s.______conductor
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">>
>
>
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">
>
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>>
> .
>
> ._____________________________________________________
>
>
> .Bioconductor mailing list
> .Bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>
> <mailto:bioconductor at="" r-______project.org=""> <mailto:bioconductor at="" r-____project.org="">
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">>
> <mailto:bioconductor at="" r-____project.org=""> <mailto:bioconductor at="" r-__project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>>>
>
>
>
> .https://stat.ethz.ch/mailman/______listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor="">
>
<https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">>
>
>
>
> <https: stat.ethz.ch="" mailman="" ____listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor="">
>
<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>>
> .Search the archives:
> http://news.gmane.org/gmane.______science.biology.informatic
s.______conductor
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor="">
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">>
>
>
>
>
> <http: news.gmane.org="" gmane.____science.biology.informatics="" .____conductor=""> <http: news.gmane.org="" gmane.__science.biology.informatics._="" _conductor="">
>
>
<http: news.gmane.org="" gmane.__science.biology.informatics.__conductor="">
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>
>
>
> Phone: (206) 667-5791
<tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org="">
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319