IRanges/List oddity: do.call of c on a list of IRangesList returns "list" only when the list is named
Hi Malcolm, The problem you are describing can be reproduced by calling c() directly on S4 objects. * With unnamed arguments: > c(IRanges(), IRanges()) IRanges of length 0 > c(Rle(), Rle()) logical-Rle of length 0 with 0 runs Lengths: Values : * With named arguments: > c(a=IRanges(),b=IRanges()) $a IRanges of length 0$b IRanges of length 0 > c(a=Rle(), b=Rle()) $a logical-Rle of length 0 with 0 runs Lengths: Values :$b logical-Rle of length 0 with 0 runs Lengths: Values : This statement (found in man page for base::c()) is showing what the root of the problem is: S4 methods: This function is S4 generic, but with argument list ?(x, ..., recursive = FALSE)?. Note that, to make things a little bit more confusing, it's not totally accurate that c() is an S4 generic, at least not on a fresh session: > isGeneric("c") [1] FALSE So my understanding of the above statement is that c() will automatically be turned into an S4 generic at the moment you try to define an S4 method for it, and, for obscure reasons that I'm not sure I understand, the argument list used in the definition of this S4 method must start with 'x'. The consequence of all this is that dispatch will happen on 'x' so if named arguments are passed with a name that is not 'x', dispatch will fail and the default method (which is base::c()) will be called :-b This explains why things work as expected in the following situations: > c(IRanges(), b=IRanges()) IRanges of length 0 > c(a=IRanges(), IRanges()) IRanges of length 0 > c(a=IRanges(), x=IRanges()) IRanges of length 0 But when all the arguments are named with names != 'x', then nothing is passed to 'x' and dispatch fails. I didn't have much luck so far with my attempts to work around this: 1. Trying to change the signature of the c() generic: > setGeneric("c", signature="...") Error in setGeneric("c", signature = "...") : ?c? is a primitive function; methods can be defined, but the generic function is implicit, and cannot be changed. 2. Trying to dispatch on "missing" or "ANY": > setMethod("c", "missing", function(x, ..., recursive=FALSE) "YES!") Error in setMethod("c", "missing", function(x, ..., recursive = FALSE) "YES!") : the method for function ?c? and signature x="missing" is sealed and cannot be re-defined > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!") Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE) "YES!") : the method for function ?c? and signature x="ANY" is sealed and cannot be re-defined With old versions of R dispatch on ... was not possible i.e. ... was not allowed to be in the signature of the generic. This was changed in recent versions of R and we're already using this new feature for a few S4 generics defined in BiocGenerics e.g. for cbind() and rbind(): > library(BiocGenerics) > rbind standardGeneric for "rbind" defined from package "BiocGenerics" function (..., deparse.level = 1) standardGeneric("rbind") <environment: 0x29b96b0=""> Methods may be defined for arguments: ... Use showMethods("rbind") for currently available ones. And dispatch works as expected, with or without named arguments: > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23)) DataFrame with 6 rows and 2 columns X Y <integer> <integer> 1 1 11 2 2 12 3 3 13 4 1 21 5 2 22 6 3 23 > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23)) DataFrame with 6 rows and 2 columns X Y <integer> <integer> 1 1 11 2 2 12 3 3 13 4 1 21 5 2 22 6 3 23 So I wonder if the weird behavior of c() is still justified. Comments/suggestions to address this are welcome. Thanks, H. On 11/30/2012 11:56 AM, Cook, Malcolm wrote:
> Hi,
>
> The following shows that do.call of c on a list of IRangesList returns "list" only when the list is named.
>
>> library(IRanges)
>> example(IRangesList)
>> class(x)
> [1] "CompressedIRangesList"
> attr(,"package")
> [1] "IRanges"
>> class(do.call(c,list(x1=x,x2=x)))
> [1] "list"
>
> I am confused this.
>
> I would not expect the fact that the list is named to have any impact on the result.
>
> But, look, omitting the list names the class is now an IRangesList
>
>> class(do.call(c,list(x,x)))
> [1] "CompressedIRangesList"
> attr(,"package")
> [1] "IRanges"
>
>> class(c(x,x))
> [1] "CompressedIRangesList"
> attr(,"package")
> [1] "IRanges"
>
> A 'workaround' is to unname the list, as demonstrated:
>
>> class(do.call(c,unname(list(x1=x,x2=x))))
> [1] "CompressedIRangesList"
> attr(,"package")
> [1] "IRanges"
>
> But, why does having a 'names' attribute effect the behavior of do.calling c so much as to change the class returned?
>
>
> Thanks for your help/education.....
>
> Malcolm Cook
> Computational Biology - Stowers Institute for Medical Research
>
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] IRanges_1.16.4 BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.20.3 BSgenome_1.26.1 Biobase_2.18.0 Biostrings_2.26.2 DBI_0.2-5 GenomicFeatures_1.10.1 GenomicRanges_1.10.5 RCurl_1.95-3 RSQLite_0.11.2 Rsamtools_1.10.2 XML_3.95-0.1 biomaRt_2.14.0 bitops_1.0-4.2 colorspace_1.2-0 data.table_1.8.6 functional_0.1 graph_1.36.1 gtools_2.7.0 parallel_2.15.1 rtracklayer_1.18.1 stats4_2.15.1 tools_2.15.1 zlibbioc_1.4.0
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> This doesn't address the issue, but I've long since gotten into the habit of defensively wrapping the list argument of my do.call(c, xlist)s with unname just to sidestep this, eg. R> do.call(c, unname(xlist)) Until a "fix" lands for this, perhaps this will help others avoid that tripwire HTH, -Steve On Friday, November 30, 2012, Hervé Pagès wrote: > Hi Malcolm, > > The problem you are describing can be reproduced by calling c() > directly on S4 objects. > > * With unnamed arguments: > > > c(IRanges(), IRanges()) > IRanges of length 0 > > > c(Rle(), Rle()) > logical-Rle of length 0 with 0 runs > Lengths: > Values : > > * With named arguments: > > > c(a=IRanges(),b=IRanges()) > $a > IRanges of length 0 > >$b > IRanges of length 0 > > > c(a=Rle(), b=Rle()) > $a > logical-Rle of length 0 with 0 runs > Lengths: > Values : > >$b > logical-Rle of length 0 with 0 runs > Lengths: > Values : > > This statement (found in man page for base::c()) is showing what the > root of the problem is: > > S4 methods: > > This function is S4 generic, but with argument list (x, ..., > recursive = FALSE). > > Note that, to make things a little bit more confusing, it's not totally > accurate that c() is an S4 generic, at least not on a fresh session: > > > isGeneric("c") > [1] FALSE > > So my understanding of the above statement is that c() will > automatically be turned into an S4 generic at the moment you try > to define an S4 method for it, and, for obscure reasons that I'm not > sure I understand, the argument list used in the definition of this > S4 method must start with 'x'. The consequence of all this is that > dispatch will happen on 'x' so if named arguments are passed with > a name that is not 'x', dispatch will fail and the default method > (which is base::c()) will be called :-b > > This explains why things work as expected in the following situations: > > > c(IRanges(), b=IRanges()) > IRanges of length 0 > > > c(a=IRanges(), IRanges()) > IRanges of length 0 > > > c(a=IRanges(), x=IRanges()) > IRanges of length 0 > > But when all the arguments are named with names != 'x', then nothing > is passed to 'x' and dispatch fails. > > I didn't have much luck so far with my attempts to work around this: > > 1. Trying to change the signature of the c() generic: > > > setGeneric("c", signature="...") > Error in setGeneric("c", signature = "...") : > c is a primitive function; methods can be defined, but > the generic function is implicit, and cannot be changed. > > 2. Trying to dispatch on "missing" or "ANY": > > > setMethod("c", "missing", function(x, ..., recursive=FALSE) "YES!") > Error in setMethod("c", "missing", function(x, ..., recursive = > FALSE) "YES!") : > the method for function c and signature x="missing" is sealed and > cannot be re-defined > > > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!") > Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE) "YES!") > : > the method for function c and signature x="ANY" is sealed and > cannot be re-defined > > With old versions of R dispatch on ... was not possible i.e. ... was not > allowed to be in the signature of the generic. This was changed in > recent versions of R and we're already using this new feature for a > few S4 generics defined in BiocGenerics e.g. for cbind() and rbind(): > > > library(BiocGenerics) > > rbind > standardGeneric for "rbind" defined from package "BiocGenerics" > > function (..., deparse.level = 1) > standardGeneric("rbind") > <environment: 0x29b96b0=""> > Methods may be defined for arguments: ... > Use showMethods("rbind") for currently available ones. > > And dispatch works as expected, with or without named arguments: > > > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23)) > DataFrame with 6 rows and 2 columns > X Y > <integer> <integer> > 1 1 11 > 2 2 12 > 3 3 13 > 4 1 21 > 5 2 22 > 6 3 23 > > > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23)) > DataFrame with 6 rows and 2 columns > X Y > <integer> <integer> > 1 1 11 > 2 2 12 > 3 3 13 > 4 1 21 > 5 2 22 > 6 3 23 > > So I wonder if the weird behavior of c() is still justified. > > Comments/suggestions to address this are welcome. > > Thanks, > H. > > > On 11/30/2012 11:56 AM, Cook, Malcolm wrote: > >> Hi, >> >> The following shows that do.call of c on a list of IRangesList returns >> "list" only when the list is named. >> >> >> library(IRanges) >>> example(IRangesList) >>> class(x) >>> >> [1] "CompressedIRangesList" >> attr(,"package") >> [1] "IRanges" >> >>> class(do.call(c,list(x1=x,x2=**x))) >>> >> [1] "list" >> >> I am confused this. >> >> I would not expect the fact that the list is named to have any impact on >> the result. >> >> But, look, omitting the list names the class is now an IRangesList >> >> class(do.call(c,list(x,x))) >>> >> [1] "CompressedIRangesList" >> attr(,"package") >> [1] "IRanges" >> >> >> class(c(x,x)) >>> >> [1] "CompressedIRangesList" >> attr(,"package") >> [1] "IRanges" >> >> A 'workaround' is to unname the list, as demonstrated: >> >> class(do.call(c,unname(list(**x1=x,x2=x)))) >>> >> [1] "CompressedIRangesList" >> attr(,"package") >> [1] "IRanges" >> >> But, why does having a 'names' attribute effect the behavior of >> do.calling c so much as to change the class returned? >> >> >> Thanks for your help/education..... >> >> Malcolm Cook >> Computational Biology - Stowers Institute for Medical Research >> >> >> sessionInfo() >>> >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-apple-darwin9.8.0/x86_**64 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] IRanges_1.16.4 BiocGenerics_0.4.0 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.20.3 BSgenome_1.26.1 Biobase_2.18.0 >> Biostrings_2.26.2 DBI_0.2-5 GenomicFeatures_1.10.1 >> GenomicRanges_1.10.5 RCurl_1.95-3 RSQLite_0.11.2 >> Rsamtools_1.10.2 XML_3.95-0.1 biomaRt_2.14.0 >> bitops_1.0-4.2 colorspace_1.2-0 data.table_1.8.6 >> functional_0.1 graph_1.36.1 gtools_2.7.0 >> parallel_2.15.1 rtracklayer_1.18.1 stats4_2.15.1 >> tools_2.15.1 zlibbioc_1.4.0 >> >>> >>> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. -- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
On Fri, Nov 30, 2012 at 3:28 PM, Hervé Pagès <hpages@fhcrc.org> wrote: > Hi Malcolm, > > The problem you are describing can be reproduced by calling c() > directly on S4 objects. > > * With unnamed arguments: > > > c(IRanges(), IRanges()) > IRanges of length 0 > > > c(Rle(), Rle()) > logical-Rle of length 0 with 0 runs > Lengths: > Values : > > * With named arguments: > > > c(a=IRanges(),b=IRanges()) > $a > IRanges of length 0 > >$b > IRanges of length 0 > > > c(a=Rle(), b=Rle()) > $a > logical-Rle of length 0 with 0 runs > Lengths: > Values : > >$b > logical-Rle of length 0 with 0 runs > Lengths: > Values : > > This statement (found in man page for base::c()) is showing what the > root of the problem is: > > S4 methods: > > This function is S4 generic, but with argument list (x, ..., > recursive = FALSE). > > Note that, to make things a little bit more confusing, it's not totally > accurate that c() is an S4 generic, at least not on a fresh session: > > > isGeneric("c") > [1] FALSE > > So my understanding of the above statement is that c() will > automatically be turned into an S4 generic at the moment you try > to define an S4 method for it, and, for obscure reasons that I'm not > sure I understand, the argument list used in the definition of this > S4 method must start with 'x'. The consequence of all this is that > dispatch will happen on 'x' so if named arguments are passed with > a name that is not 'x', dispatch will fail and the default method > (which is base::c()) will be called :-b > > This explains why things work as expected in the following situations: > > > c(IRanges(), b=IRanges()) > IRanges of length 0 > > > c(a=IRanges(), IRanges()) > IRanges of length 0 > > > c(a=IRanges(), x=IRanges()) > IRanges of length 0 > > But when all the arguments are named with names != 'x', then nothing > is passed to 'x' and dispatch fails. > > I didn't have much luck so far with my attempts to work around this: > > 1. Trying to change the signature of the c() generic: > > > setGeneric("c", signature="...") > Error in setGeneric("c", signature = "...") : > c is a primitive function; methods can be defined, but > the generic function is implicit, and cannot be changed. > > 2. Trying to dispatch on "missing" or "ANY": > > > setMethod("c", "missing", function(x, ..., recursive=FALSE) "YES!") > Error in setMethod("c", "missing", function(x, ..., recursive = > FALSE) "YES!") : > the method for function c and signature x="missing" is sealed and > cannot be re-defined > > > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!") > Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE) "YES!") > : > the method for function c and signature x="ANY" is sealed and > cannot be re-defined > > With old versions of R dispatch on ... was not possible i.e. ... was not > allowed to be in the signature of the generic. This was changed in > recent versions of R and we're already using this new feature for a > few S4 generics defined in BiocGenerics e.g. for cbind() and rbind(): > > > library(BiocGenerics) > > rbind > standardGeneric for "rbind" defined from package "BiocGenerics" > > function (..., deparse.level = 1) > standardGeneric("rbind") > <environment: 0x29b96b0=""> > Methods may be defined for arguments: ... > Use showMethods("rbind") for currently available ones. > > And dispatch works as expected, with or without named arguments: > > > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23)) > DataFrame with 6 rows and 2 columns > X Y > <integer> <integer> > 1 1 11 > 2 2 12 > 3 3 13 > 4 1 21 > 5 2 22 > 6 3 23 > > > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23)) > DataFrame with 6 rows and 2 columns > X Y > <integer> <integer> > 1 1 11 > 2 2 12 > 3 3 13 > 4 1 21 > 5 2 22 > 6 3 23 > > So I wonder if the weird behavior of c() is still justified. > > Comments/suggestions to address this are welcome. > > The issue is that (unlike 'rbind') 'c' is a primitive and dispatch for primitives is hard-coded in C. C-level dispatch is a simplified variant of the R implementation, so I'm guessing it does not work with "...". Btw, you can get a peak at the 'c' generic with: > getGeneric("c") standardGeneric for "c" defined from package "base" function (x, ..., recursive = FALSE) standardGeneric("c", .Primitive("c")) <bytecode: 0x382af20=""> <environment: 0x34d6878=""> Methods may be defined for arguments: x, recursive Use showMethods("c") for currently available ones. Michael Thanks, > H. > > > > On 11/30/2012 11:56 AM, Cook, Malcolm wrote: > >> Hi, >> >> The following shows that do.call of c on a list of IRangesList returns >> "list" only when the list is named. >> >> >> library(IRanges) >>> example(IRangesList) >>> class(x) >>> >> [1] "CompressedIRangesList" >> attr(,"package") >> [1] "IRanges" >> >>> class(do.call(c,list(x1=x,x2=**x))) >>> >> [1] "list" >> >> I am confused this. >> >> I would not expect the fact that the list is named to have any impact on >> the result. >> >> But, look, omitting the list names the class is now an IRangesList >> >> class(do.call(c,list(x,x))) >>> >> [1] "CompressedIRangesList" >> attr(,"package") >> [1] "IRanges" >> >> >> class(c(x,x)) >>> >> [1] "CompressedIRangesList" >> attr(,"package") >> [1] "IRanges" >> >> A 'workaround' is to unname the list, as demonstrated: >> >> class(do.call(c,unname(list(**x1=x,x2=x)))) >>> >> [1] "CompressedIRangesList" >> attr(,"package") >> [1] "IRanges" >> >> But, why does having a 'names' attribute effect the behavior of >> do.calling c so much as to change the class returned? >> >> >> Thanks for your help/education..... >> >> Malcolm Cook >> Computational Biology - Stowers Institute for Medical Research >> >> >> sessionInfo() >>> >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-apple-darwin9.8.0/x86_**64 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] IRanges_1.16.4 BiocGenerics_0.4.0 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.20.3 BSgenome_1.26.1 Biobase_2.18.0 >> Biostrings_2.26.2 DBI_0.2-5 GenomicFeatures_1.10.1 >> GenomicRanges_1.10.5 RCurl_1.95-3 RSQLite_0.11.2 >> Rsamtools_1.10.2 XML_3.95-0.1 biomaRt_2.14.0 >> bitops_1.0-4.2 colorspace_1.2-0 data.table_1.8.6 >> functional_0.1 graph_1.36.1 gtools_2.7.0 >> parallel_2.15.1 rtracklayer_1.18.1 stats4_2.15.1 >> tools_2.15.1 zlibbioc_1.4.0 >> >>> >>> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. [[alternative HTML version deleted]]
Steve, Michael, Herve, all As always, "illuminating". And, as often, frustrating. I am clear how unname serves as a workaround for my current purpose. So, I can proceed. But, I remain unclear if this (to me, odd) behavior of base::c is desirable or justifiable in any sense of the word. Is this informed by a rational language design, or, as Mike suggests, the result of layering on of OO design onto a functional base. In your opinion, do you/we think this issue should this issue be raised on R-devel? Or is it a "waste of time"? Thanks for your thoughts/help. ~Malcolm From: Michael Lawrence [mailto:lawrence.michael@gene.com] Sent: Monday, December 03, 2012 11:31 AM To: Hervé Pagès Cc: Cook, Malcolm; bioconductor@r-project.org Subject: Re: [BioC] IRanges/List oddity: do.call of c on a list of IRangesList returns "list" only when the list is named On Fri, Nov 30, 2012 at 3:28 PM, Hervé Pagès <hpages@fhcrc.org<mailto:hpages@fhcrc.org>> wrote: Hi Malcolm, The problem you are describing can be reproduced by calling c() directly on S4 objects. * With unnamed arguments: > c(IRanges(), IRanges()) IRanges of length 0 > c(Rle(), Rle()) logical-Rle of length 0 with 0 runs Lengths: Values : * With named arguments: > c(a=IRanges(),b=IRanges()) $a IRanges of length 0$b IRanges of length 0 > c(a=Rle(), b=Rle()) $a logical-Rle of length 0 with 0 runs Lengths: Values :$b logical-Rle of length 0 with 0 runs Lengths: Values : This statement (found in man page for base::c()) is showing what the root of the problem is: S4 methods: This function is S4 generic, but with argument list '(x, ..., recursive = FALSE)'. Note that, to make things a little bit more confusing, it's not totally accurate that c() is an S4 generic, at least not on a fresh session: > isGeneric("c") [1] FALSE So my understanding of the above statement is that c() will automatically be turned into an S4 generic at the moment you try to define an S4 method for it, and, for obscure reasons that I'm not sure I understand, the argument list used in the definition of this S4 method must start with 'x'. The consequence of all this is that dispatch will happen on 'x' so if named arguments are passed with a name that is not 'x', dispatch will fail and the default method (which is base::c()) will be called :-b This explains why things work as expected in the following situations: > c(IRanges(), b=IRanges()) IRanges of length 0 > c(a=IRanges(), IRanges()) IRanges of length 0 > c(a=IRanges(), x=IRanges()) IRanges of length 0 But when all the arguments are named with names != 'x', then nothing is passed to 'x' and dispatch fails. I didn't have much luck so far with my attempts to work around this: 1. Trying to change the signature of the c() generic: > setGeneric("c", signature="...") Error in setGeneric("c", signature = "...") : 'c' is a primitive function; methods can be defined, but the generic function is implicit, and cannot be changed. 2. Trying to dispatch on "missing" or "ANY": > setMethod("c", "missing", function(x, ..., recursive=FALSE) "YES!") Error in setMethod("c", "missing", function(x, ..., recursive = FALSE) "YES!") : the method for function 'c' and signature x="missing" is sealed and cannot be re-defined > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!") Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE) "YES!") : the method for function 'c' and signature x="ANY" is sealed and cannot be re-defined With old versions of R dispatch on ... was not possible i.e. ... was not allowed to be in the signature of the generic. This was changed in recent versions of R and we're already using this new feature for a few S4 generics defined in BiocGenerics e.g. for cbind() and rbind(): > library(BiocGenerics) > rbind standardGeneric for "rbind" defined from package "BiocGenerics" function (..., deparse.level = 1) standardGeneric("rbind") <environment: 0x29b96b0=""> Methods may be defined for arguments: ... Use showMethods("rbind") for currently available ones. And dispatch works as expected, with or without named arguments: > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23)) DataFrame with 6 rows and 2 columns X Y <integer> <integer> 1 1 11 2 2 12 3 3 13 4 1 21 5 2 22 6 3 23 > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23)) DataFrame with 6 rows and 2 columns X Y <integer> <integer> 1 1 11 2 2 12 3 3 13 4 1 21 5 2 22 6 3 23 So I wonder if the weird behavior of c() is still justified. Comments/suggestions to address this are welcome. The issue is that (unlike 'rbind') 'c' is a primitive and dispatch for primitives is hard-coded in C. C-level dispatch is a simplified variant of the R implementation, so I'm guessing it does not work with "...". Btw, you can get a peak at the 'c' generic with: > getGeneric("c") standardGeneric for "c" defined from package "base" function (x, ..., recursive = FALSE) standardGeneric("c", .Primitive("c")) <bytecode: 0x382af20=""> <environment: 0x34d6878=""> Methods may be defined for arguments: x, recursive Use showMethods("c") for currently available ones. Michael Thanks, H. On 11/30/2012 11:56 AM, Cook, Malcolm wrote: Hi, The following shows that do.call of c on a list of IRangesList returns "list" only when the list is named. library(IRanges) example(IRangesList) class(x) [1] "CompressedIRangesList" attr(,"package") [1] "IRanges" class(do.call(c,list(x1=x,x2=x))) [1] "list" I am confused this. I would not expect the fact that the list is named to have any impact on the result. But, look, omitting the list names the class is now an IRangesList class(do.call(c,list(x,x))) [1] "CompressedIRangesList" attr(,"package") [1] "IRanges" class(c(x,x)) [1] "CompressedIRangesList" attr(,"package") [1] "IRanges" A 'workaround' is to unname the list, as demonstrated: class(do.call(c,unname(list(x1=x,x2=x)))) [1] "CompressedIRangesList" attr(,"package") [1] "IRanges" But, why does having a 'names' attribute effect the behavior of do.calling c so much as to change the class returned? Thanks for your help/education..... Malcolm Cook Computational Biology - Stowers Institute for Medical Research sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] IRanges_1.16.4 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.20.3 BSgenome_1.26.1 Biobase_2.18.0 Biostrings_2.26.2 DBI_0.2-5 GenomicFeatures_1.10.1 GenomicRanges_1.10.5 RCurl_1.95-3 RSQLite_0.11.2 Rsamtools_1.10.2 XML_3.95-0.1 biomaRt_2.14.0 bitops_1.0-4.2 colorspace_1.2-0 data.table_1.8.6 functional_0.1 graph_1.36.1 gtools_2.7.0 parallel_2.15.1 rtracklayer_1.18.1 stats4_2.15.1 tools_2.15.1 zlibbioc_1.4.0 _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. [[alternative HTML version deleted]]
Hi Malcolm, I'm not sure what the reasons are for the current behaviour of the c() generic, if they're just historical, or if there is something deeper, or... My view on the "primitive" status of a function is that it should be an implementation detail, maybe an important one, but a detail anyway in the sense that being implemented as a .Primitive or an .Internal or just in plain R should not affect the semantic of a function. Interestingly there is a short comment in ?.Primitive suggesting that people's code should not depend on knowing which functions are primitive because this does change as R evolves. Unfortunately the reality is very different: there are situations where you definitely need to know that something is a primitive, just because argument passing (and consequently method dispatch) works differently. On a more positive note, I found a hack that allows c() to dispatch on ...: setGeneric("c", signature="...", function(..., recursive=FALSE) standardGeneric("c"), useAsDefault=function(..., recursive=FALSE) base::c(..., recursive=recursive) ) Then: setClass("A", representation(aa="integer")) setMethod("c", "A", function(..., recursive=FALSE) { args <- list(...) ans_aa <- unlist(lapply(args, slot, "aa"), use.names=FALSE) new("A", aa=ans_aa) } ) > a1 <- new("A", aa=1:3) > a2 <- new("A", aa=22:25) > c(a1, a2) An object of class "A" Slot "aa": [1] 1 2 3 22 23 24 25 > c(a1, x=a2) An object of class "A" Slot "aa": [1] 1 2 3 22 23 24 25 > c(A=a1, B=a2) An object of class "A" Slot "aa": [1] 1 2 3 22 23 24 25 Overriding base::c() with our own c() is pretty invasive though and I didn't test it enough to guarantee that it doesn't break or slowdown things. Also one important thing to note is that this signature doesn't allow specific methods to implement extra arguments (like the "c" method for GenomicRanges does), which kind of makes sense because the generic function is putting named args that are not named 'recursive' in ..., and dispatches on them. The same restriction applies to the cbind() and rbind() generics: > setMethod("cbind", "A", function(..., deparse.level=1, my.toggle=FALSE) NULL) Creating a generic function for ?cbind? from package ?base? in the global environment in method for ?cbind? with signature ?"A"?: no definition for class ?A? Error in rematchDefinition(definition, fdef, mnames, fnames, signature) : arguments (deparse.level) after '...' in the generic must appear in the method, in the same place at the end of the argument list So some of the "c" methods would need to be revisited. Anyway, would need serious testing before adding this generic to BiocGenerics. Is it worth it? Cheers, H. On 12/03/2012 12:11 PM, Cook, Malcolm wrote: > Steve, Michael, Herve, all > > As always, ?illuminating?. > > And, as often, frustrating. > > I am clear how unname serves as a workaround for my current purpose. > So, I can proceed. > > But, I remain unclear if this (to me, odd) behavior of base::c is > desirable or justifiable in any sense of the word. Is this informed by > a rational language design, or, as Mike suggests, the result of layering > on of OO design onto a functional base. > > In your opinion, do you/we think this issue should this issue be raised > on R-devel? Or is it a ?waste of time?? > > Thanks for your thoughts/help. > > ~Malcolm > > *From:*Michael Lawrence [mailto:lawrence.michael at gene.com] > *Sent:* Monday, December 03, 2012 11:31 AM > *To:* Hervé Pagès > *Cc:* Cook, Malcolm; bioconductor at r-project.org > *Subject:* Re: [BioC] IRanges/List oddity: do.call of c on a list of > IRangesList returns "list" only when the list is named > > On Fri, Nov 30, 2012 at 3:28 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote: > > Hi Malcolm, > > The problem you are describing can be reproduced by calling c() > directly on S4 objects. > > * With unnamed arguments: > > > c(IRanges(), IRanges()) > IRanges of length 0 > > > c(Rle(), Rle()) > logical-Rle of length 0 with 0 runs > Lengths: > Values : > > * With named arguments: > > > c(a=IRanges(),b=IRanges()) > $a > IRanges of length 0 > >$b > IRanges of length 0 > > > c(a=Rle(), b=Rle()) > $a > logical-Rle of length 0 with 0 runs > Lengths: > Values : > >$b > logical-Rle of length 0 with 0 runs > Lengths: > Values : > > This statement (found in man page for base::c()) is showing what the > root of the problem is: > > S4 methods: > > This function is S4 generic, but with argument list ?(x, ..., > recursive = FALSE)?. > > Note that, to make things a little bit more confusing, it's not totally > accurate that c() is an S4 generic, at least not on a fresh session: > > > isGeneric("c") > [1] FALSE > > So my understanding of the above statement is that c() will > automatically be turned into an S4 generic at the moment you try > to define an S4 method for it, and, for obscure reasons that I'm not > sure I understand, the argument list used in the definition of this > S4 method must start with 'x'. The consequence of all this is that > dispatch will happen on 'x' so if named arguments are passed with > a name that is not 'x', dispatch will fail and the default method > (which is base::c()) will be called :-b > > This explains why things work as expected in the following situations: > > > c(IRanges(), b=IRanges()) > IRanges of length 0 > > > c(a=IRanges(), IRanges()) > IRanges of length 0 > > > c(a=IRanges(), x=IRanges()) > IRanges of length 0 > > But when all the arguments are named with names != 'x', then nothing > is passed to 'x' and dispatch fails. > > I didn't have much luck so far with my attempts to work around this: > > 1. Trying to change the signature of the c() generic: > > > setGeneric("c", signature="...") > Error in setGeneric("c", signature = "...") : > ?c? is a primitive function; methods can be defined, but > the generic function is implicit, and cannot be changed. > > 2. Trying to dispatch on "missing" or "ANY": > > > setMethod("c", "missing", function(x, ..., recursive=FALSE) "YES!") > Error in setMethod("c", "missing", function(x, ..., recursive = > FALSE) "YES!") : > the method for function ?c? and signature x="missing" is sealed > and cannot be re-defined > > > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!") > Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE) "YES!") : > the method for function ?c? and signature x="ANY" is sealed and > cannot be re-defined > > With old versions of R dispatch on ... was not possible i.e. ... was not > allowed to be in the signature of the generic. This was changed in > recent versions of R and we're already using this new feature for a > few S4 generics defined in BiocGenerics e.g. for cbind() and rbind(): > > > library(BiocGenerics) > > rbind > standardGeneric for "rbind" defined from package "BiocGenerics" > > function (..., deparse.level = 1) > standardGeneric("rbind") > <environment: 0x29b96b0=""> > Methods may be defined for arguments: ... > Use showMethods("rbind") for currently available ones. > > And dispatch works as expected, with or without named arguments: > > > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23)) > DataFrame with 6 rows and 2 columns > X Y > <integer> <integer> > 1 1 11 > 2 2 12 > 3 3 13 > 4 1 21 > 5 2 22 > 6 3 23 > > > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23)) > DataFrame with 6 rows and 2 columns > X Y > <integer> <integer> > 1 1 11 > 2 2 12 > 3 3 13 > 4 1 21 > 5 2 22 > 6 3 23 > > So I wonder if the weird behavior of c() is still justified. > > Comments/suggestions to address this are welcome. > > > > The issue is that (unlike 'rbind') 'c' is a primitive and dispatch for > primitives is hard-coded in C. C-level dispatch is a simplified variant > of the R implementation, so I'm guessing it does not work with "...". > > Btw, you can get a peak at the 'c' generic with: > > getGeneric("c") > standardGeneric for "c" defined from package "base" > > function (x, ..., recursive = FALSE) > standardGeneric("c", .Primitive("c")) > <bytecode: 0x382af20=""> > <environment: 0x34d6878=""> > Methods may be defined for arguments: x, recursive > Use showMethods("c") for currently available ones. > > Michael > > Thanks, > H. > > > > > On 11/30/2012 11:56 AM, Cook, Malcolm wrote: > > Hi, > > The following shows that do.call of c on a list of IRangesList > returns "list" only when the list is named. > > library(IRanges) > example(IRangesList) > class(x) > > [1] "CompressedIRangesList" > attr(,"package") > [1] "IRanges" > > class(do.call(c,list(x1=x,x2=x))) > > [1] "list" > > I am confused this. > > I would not expect the fact that the list is named to have any > impact on the result. > > But, look, omitting the list names the class is now an IRangesList > > class(do.call(c,list(x,x))) > > [1] "CompressedIRangesList" > attr(,"package") > [1] "IRanges" > > class(c(x,x)) > > [1] "CompressedIRangesList" > attr(,"package") > [1] "IRanges" > > A 'workaround' is to unname the list, as demonstrated: > > class(do.call(c,unname(list(x1=x,x2=x)))) > > [1] "CompressedIRangesList" > attr(,"package") > [1] "IRanges" > > But, why does having a 'names' attribute effect the behavior of > do.calling c so much as to change the class returned? > > > Thanks for your help/education..... > > Malcolm Cook > Computational Biology - Stowers Institute for Medical Research > > sessionInfo() > > R version 2.15.1 (2012-06-22) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] IRanges_1.16.4 BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.20.3 BSgenome_1.26.1 Biobase_2.18.0 > Biostrings_2.26.2 DBI_0.2-5 > GenomicFeatures_1.10.1 GenomicRanges_1.10.5 RCurl_1.95-3 > RSQLite_0.11.2 Rsamtools_1.10.2 XML_3.95-0.1 > biomaRt_2.14.0 bitops_1.0-4.2 colorspace_1.2-0 > data.table_1.8.6 functional_0.1 graph_1.36.1 > gtools_2.7.0 parallel_2.15.1 > rtracklayer_1.18.1 stats4_2.15.1 tools_2.15.1 > zlibbioc_1.4.0 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. -- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319
Probably better to bring this issue to the attention of John Chambers. Since he's invited us to start hacking on the methods package, this might be a good opportunity smooth out some of these rough edges. Michael On Wed, Dec 12, 2012 at 6:46 PM, Hervé Pagès <hpages@fhcrc.org> wrote: > Hi Malcolm, > > I'm not sure what the reasons are for the current behaviour > of the c() generic, if they're just historical, or if there > is something deeper, or... > > My view on the "primitive" status of a function is that it should > be an implementation detail, maybe an important one, but a > detail anyway in the sense that being implemented as a .Primitive > or an .Internal or just in plain R should not affect the semantic > of a function. Interestingly there is a short comment in ?.Primitive > suggesting that people's code should not depend on knowing which > functions are primitive because this does change as R evolves. > Unfortunately the reality is very different: there are situations > where you definitely need to know that something is a primitive, > just because argument passing (and consequently method dispatch) > works differently. > > On a more positive note, I found a hack that allows c() to dispatch > on ...: > > setGeneric("c", signature="...", > function(..., recursive=FALSE) > standardGeneric("c"), > useAsDefault=function(..., recursive=FALSE) > base::c(..., recursive=recursive) > ) > > Then: > > setClass("A", representation(aa="integer")) > > setMethod("c", "A", > function(..., recursive=FALSE) > { > args <- list(...) > ans_aa <- unlist(lapply(args, slot, "aa"), use.names=FALSE) > new("A", aa=ans_aa) > } > ) > > > a1 <- new("A", aa=1:3) > > a2 <- new("A", aa=22:25) > > > c(a1, a2) > An object of class "A" > Slot "aa": > [1] 1 2 3 22 23 24 25 > > > c(a1, x=a2) > An object of class "A" > Slot "aa": > [1] 1 2 3 22 23 24 25 > > > c(A=a1, B=a2) > An object of class "A" > Slot "aa": > [1] 1 2 3 22 23 24 25 > > Overriding base::c() with our own c() is pretty invasive though and > I didn't test it enough to guarantee that it doesn't break or slowdown > things. > > Also one important thing to note is that this signature doesn't > allow specific methods to implement extra arguments (like the "c" > method for GenomicRanges does), which kind of makes sense because > the generic function is putting named args that are not named > 'recursive' in ..., and dispatches on them. The same restriction > applies to the cbind() and rbind() generics: > > > setMethod("cbind", "A", function(..., deparse.level=1, > my.toggle=FALSE) NULL) > Creating a generic function for cbind from package base in the > global environment > in method for cbind with signature "A": no definition for class A > Error in rematchDefinition(definition, fdef, mnames, fnames, signature) : > arguments (deparse.level) after '...' in the generic must appear in > the method, in the same place at the end of the argument list > > So some of the "c" methods would need to be revisited. > > Anyway, would need serious testing before adding this generic to > BiocGenerics. Is it worth it? > > Cheers, > H. > > > > On 12/03/2012 12:11 PM, Cook, Malcolm wrote: > >> Steve, Michael, Herve, all >> >> As always, illuminating. >> >> And, as often, frustrating. >> >> I am clear how unname serves as a workaround for my current purpose. >> So, I can proceed. >> >> But, I remain unclear if this (to me, odd) behavior of base::c is >> desirable or justifiable in any sense of the word. Is this informed by >> a rational language design, or, as Mike suggests, the result of layering >> on of OO design onto a functional base. >> >> In your opinion, do you/we think this issue should this issue be raised >> on R-devel? Or is it a waste of time? >> >> Thanks for your thoughts/help. >> >> ~Malcolm >> >> *From:*Michael Lawrence [mailto:lawrence.michael@gene.**com<lawrence.michael@gene.com> >> ] >> *Sent:* Monday, December 03, 2012 11:31 AM >> *To:* Hervé Pagès >> *Cc:* Cook, Malcolm; bioconductor@r-project.org >> *Subject:* Re: [BioC] IRanges/List oddity: do.call of c on a list of >> >> IRangesList returns "list" only when the list is named >> >> On Fri, Nov 30, 2012 at 3:28 PM, Hervé Pagès <hpages@fhcrc.org>> <mailto:hpages@fhcrc.org>> wrote: >> >> Hi Malcolm, >> >> The problem you are describing can be reproduced by calling c() >> directly on S4 objects. >> >> * With unnamed arguments: >> >> > c(IRanges(), IRanges()) >> IRanges of length 0 >> >> > c(Rle(), Rle()) >> logical-Rle of length 0 with 0 runs >> Lengths: >> Values : >> >> * With named arguments: >> >> > c(a=IRanges(),b=IRanges()) >> $a >> IRanges of length 0 >> >>$b >> IRanges of length 0 >> >> > c(a=Rle(), b=Rle()) >> $a >> logical-Rle of length 0 with 0 runs >> Lengths: >> Values : >> >>$b >> logical-Rle of length 0 with 0 runs >> Lengths: >> Values : >> >> This statement (found in man page for base::c()) is showing what the >> root of the problem is: >> >> S4 methods: >> >> This function is S4 generic, but with argument list (x, ..., >> recursive = FALSE). >> >> Note that, to make things a little bit more confusing, it's not totally >> accurate that c() is an S4 generic, at least not on a fresh session: >> >> > isGeneric("c") >> [1] FALSE >> >> So my understanding of the above statement is that c() will >> automatically be turned into an S4 generic at the moment you try >> to define an S4 method for it, and, for obscure reasons that I'm not >> sure I understand, the argument list used in the definition of this >> S4 method must start with 'x'. The consequence of all this is that >> dispatch will happen on 'x' so if named arguments are passed with >> a name that is not 'x', dispatch will fail and the default method >> (which is base::c()) will be called :-b >> >> This explains why things work as expected in the following situations: >> >> > c(IRanges(), b=IRanges()) >> IRanges of length 0 >> >> > c(a=IRanges(), IRanges()) >> IRanges of length 0 >> >> > c(a=IRanges(), x=IRanges()) >> IRanges of length 0 >> >> But when all the arguments are named with names != 'x', then nothing >> is passed to 'x' and dispatch fails. >> >> I didn't have much luck so far with my attempts to work around this: >> >> 1. Trying to change the signature of the c() generic: >> >> > setGeneric("c", signature="...") >> Error in setGeneric("c", signature = "...") : >> c is a primitive function; methods can be defined, but >> the generic function is implicit, and cannot be changed. >> >> 2. Trying to dispatch on "missing" or "ANY": >> >> > setMethod("c", "missing", function(x, ..., recursive=FALSE) >> "YES!") >> Error in setMethod("c", "missing", function(x, ..., recursive = >> FALSE) "YES!") : >> the method for function c and signature x="missing" is sealed >> and cannot be re-defined >> >> > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!") >> Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE) >> "YES!") : >> the method for function c and signature x="ANY" is sealed and >> cannot be re-defined >> >> With old versions of R dispatch on ... was not possible i.e. ... was not >> allowed to be in the signature of the generic. This was changed in >> recent versions of R and we're already using this new feature for a >> few S4 generics defined in BiocGenerics e.g. for cbind() and rbind(): >> >> > library(BiocGenerics) >> > rbind >> standardGeneric for "rbind" defined from package "BiocGenerics" >> >> function (..., deparse.level = 1) >> standardGeneric("rbind") >> <environment: 0x29b96b0=""> >> Methods may be defined for arguments: ... >> Use showMethods("rbind") for currently available ones. >> >> And dispatch works as expected, with or without named arguments: >> >> > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23)) >> DataFrame with 6 rows and 2 columns >> X Y >> <integer> <integer> >> 1 1 11 >> 2 2 12 >> 3 3 13 >> 4 1 21 >> 5 2 22 >> 6 3 23 >> >> > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23)) >> DataFrame with 6 rows and 2 columns >> X Y >> <integer> <integer> >> 1 1 11 >> 2 2 12 >> 3 3 13 >> 4 1 21 >> 5 2 22 >> 6 3 23 >> >> So I wonder if the weird behavior of c() is still justified. >> >> Comments/suggestions to address this are welcome. >> >> >> >> The issue is that (unlike 'rbind') 'c' is a primitive and dispatch for >> primitives is hard-coded in C. C-level dispatch is a simplified variant >> of the R implementation, so I'm guessing it does not work with "...". >> >> Btw, you can get a peak at the 'c' generic with: >> > getGeneric("c") >> standardGeneric for "c" defined from package "base" >> >> function (x, ..., recursive = FALSE) >> standardGeneric("c", .Primitive("c")) >> <bytecode: 0x382af20=""> >> <environment: 0x34d6878=""> >> Methods may be defined for arguments: x, recursive >> Use showMethods("c") for currently available ones. >> >> Michael >> >> Thanks, >> H. >> >> >> >> >> On 11/30/2012 11:56 AM, Cook, Malcolm wrote: >> >> Hi, >> >> The following shows that do.call of c on a list of IRangesList >> returns "list" only when the list is named. >> >> library(IRanges) >> example(IRangesList) >> class(x) >> >> [1] "CompressedIRangesList" >> attr(,"package") >> [1] "IRanges" >> >> class(do.call(c,list(x1=x,x2=**x))) >> >> [1] "list" >> >> I am confused this. >> >> I would not expect the fact that the list is named to have any >> impact on the result. >> >> But, look, omitting the list names the class is now an IRangesList >> >> class(do.call(c,list(x,x))) >> >> [1] "CompressedIRangesList" >> attr(,"package") >> [1] "IRanges" >> >> class(c(x,x)) >> >> [1] "CompressedIRangesList" >> attr(,"package") >> [1] "IRanges" >> >> A 'workaround' is to unname the list, as demonstrated: >> >> class(do.call(c,unname(list(**x1=x,x2=x)))) >> >> [1] "CompressedIRangesList" >> attr(,"package") >> [1] "IRanges" >> >> But, why does having a 'names' attribute effect the behavior of >> do.calling c so much as to change the class returned? >> >> >> Thanks for your help/education..... >> >> Malcolm Cook >> Computational Biology - Stowers Institute for Medical Research >> >> sessionInfo() >> >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-apple-darwin9.8.0/x86_**64 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] IRanges_1.16.4 BiocGenerics_0.4.0 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.20.3 BSgenome_1.26.1 Biobase_2.18.0 >> Biostrings_2.26.2 DBI_0.2-5 >> GenomicFeatures_1.10.1 GenomicRanges_1.10.5 RCurl_1.95-3 >> RSQLite_0.11.2 Rsamtools_1.10.2 XML_3.95-0.1 >> biomaRt_2.14.0 bitops_1.0-4.2 colorspace_1.2-0 >> data.table_1.8.6 functional_0.1 graph_1.36.1 >> gtools_2.7.0 parallel_2.15.1 >> rtracklayer_1.18.1 stats4_2.15.1 tools_2.15.1 >> zlibbioc_1.4.0 >> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: st="" at.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: >> http://news.gmane.org/gmane.**science.biology.informatics.**con ductor<http: news.gmane.org="" gmane.science.biology.informatics.conduct="" or=""> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: st="" at.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: >> http://news.gmane.org/gmane.**science.biology.informatics.**con ductor<http: news.gmane.org="" gmane.science.biology.informatics.conduct="" or=""> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. [[alternative HTML version deleted]]
Thanks for digging into this, Herve, Michael. Herve, I really appreciate your following up on R-devel, such as you recently did that got mapply 'fixed' to work natively with Bioc's List and friends (c.f. http://developer.r-project.org/blosxom.cgi/R-devel/NEWS) I don't think re-defining c as a generic in BioConductor is a good workaround, for the reasons you mentioned Herve. The issue will just crop up again with someone else's non BioC S4 class structure. It really is not a BioConductor issue at all. If this can also be kicked upstream, that would serve others as well. Thoughts? ~Malcolm From: Michael Lawrence [mailto:lawrence.michael@gene.com] Sent: Thursday, December 13, 2012 11:13 AM To: Hervé Pagès Cc: Cook, Malcolm; Michael Lawrence; bioconductor@r-project.org Subject: Re: [BioC] IRanges/List oddity: do.call of c on a list of IRangesList returns "list" only when the list is named Probably better to bring this issue to the attention of John Chambers. Since he's invited us to start hacking on the methods package, this might be a good opportunity smooth out some of these rough edges. Michael On Wed, Dec 12, 2012 at 6:46 PM, Hervé Pagès <hpages@fhcrc.org<mailto:hpages@fhcrc.org>> wrote: Hi Malcolm, I'm not sure what the reasons are for the current behaviour of the c() generic, if they're just historical, or if there is something deeper, or... My view on the "primitive" status of a function is that it should be an implementation detail, maybe an important one, but a detail anyway in the sense that being implemented as a .Primitive or an .Internal or just in plain R should not affect the semantic of a function. Interestingly there is a short comment in ?.Primitive suggesting that people's code should not depend on knowing which functions are primitive because this does change as R evolves. Unfortunately the reality is very different: there are situations where you definitely need to know that something is a primitive, just because argument passing (and consequently method dispatch) works differently. On a more positive note, I found a hack that allows c() to dispatch on ...: setGeneric("c", signature="...", function(..., recursive=FALSE) standardGeneric("c"), useAsDefault=function(..., recursive=FALSE) base::c(..., recursive=recursive) ) Then: setClass("A", representation(aa="integer")) setMethod("c", "A", function(..., recursive=FALSE) { args <- list(...) ans_aa <- unlist(lapply(args, slot, "aa"), use.names=FALSE) new("A", aa=ans_aa) } ) > a1 <- new("A", aa=1:3) > a2 <- new("A", aa=22:25) > c(a1, a2) An object of class "A" Slot "aa": [1] 1 2 3 22 23 24 25 > c(a1, x=a2) An object of class "A" Slot "aa": [1] 1 2 3 22 23 24 25 > c(A=a1, B=a2) An object of class "A" Slot "aa": [1] 1 2 3 22 23 24 25 Overriding base::c() with our own c() is pretty invasive though and I didn't test it enough to guarantee that it doesn't break or slowdown things. Also one important thing to note is that this signature doesn't allow specific methods to implement extra arguments (like the "c" method for GenomicRanges does), which kind of makes sense because the generic function is putting named args that are not named 'recursive' in ..., and dispatches on them. The same restriction applies to the cbind() and rbind() generics: > setMethod("cbind", "A", function(..., deparse.level=1, my.toggle=FALSE) NULL) Creating a generic function for 'cbind' from package 'base' in the global environment in method for 'cbind' with signature '"A"': no definition for class "A" Error in rematchDefinition(definition, fdef, mnames, fnames, signature) : arguments (deparse.level) after '...' in the generic must appear in the method, in the same place at the end of the argument list So some of the "c" methods would need to be revisited. Anyway, would need serious testing before adding this generic to BiocGenerics. Is it worth it? Cheers, H. On 12/03/2012 12:11 PM, Cook, Malcolm wrote: Steve, Michael, Herve, all As always, "illuminating". And, as often, frustrating. I am clear how unname serves as a workaround for my current purpose. So, I can proceed. But, I remain unclear if this (to me, odd) behavior of base::c is desirable or justifiable in any sense of the word. Is this informed by a rational language design, or, as Mike suggests, the result of layering on of OO design onto a functional base. In your opinion, do you/we think this issue should this issue be raised on R-devel? Or is it a "waste of time"? Thanks for your thoughts/help. ~Malcolm *From:*Michael Lawrence [mailto:lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>] *Sent:* Monday, December 03, 2012 11:31 AM *To:* Hervé Pagès *Cc:* Cook, Malcolm; bioconductor@r-project.org<mailto:bioconductor@r-project.org> *Subject:* Re: [BioC] IRanges/List oddity: do.call of c on a list of IRangesList returns "list" only when the list is named On Fri, Nov 30, 2012 at 3:28 PM, Hervé Pagès <hpages@fhcrc.org<mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org<mailto:hpages@fhcrc.org>>> wrote: Hi Malcolm, The problem you are describing can be reproduced by calling c() directly on S4 objects. * With unnamed arguments: > c(IRanges(), IRanges()) IRanges of length 0 > c(Rle(), Rle()) logical-Rle of length 0 with 0 runs Lengths: Values : * With named arguments: > c(a=IRanges(),b=IRanges()) $a IRanges of length 0$b IRanges of length 0 > c(a=Rle(), b=Rle()) $a logical-Rle of length 0 with 0 runs Lengths: Values :$b logical-Rle of length 0 with 0 runs Lengths: Values : This statement (found in man page for base::c()) is showing what the root of the problem is: S4 methods: This function is S4 generic, but with argument list '(x, ..., recursive = FALSE)'. Note that, to make things a little bit more confusing, it's not totally accurate that c() is an S4 generic, at least not on a fresh session: > isGeneric("c") [1] FALSE So my understanding of the above statement is that c() will automatically be turned into an S4 generic at the moment you try to define an S4 method for it, and, for obscure reasons that I'm not sure I understand, the argument list used in the definition of this S4 method must start with 'x'. The consequence of all this is that dispatch will happen on 'x' so if named arguments are passed with a name that is not 'x', dispatch will fail and the default method (which is base::c()) will be called :-b This explains why things work as expected in the following situations: > c(IRanges(), b=IRanges()) IRanges of length 0 > c(a=IRanges(), IRanges()) IRanges of length 0 > c(a=IRanges(), x=IRanges()) IRanges of length 0 But when all the arguments are named with names != 'x', then nothing is passed to 'x' and dispatch fails. I didn't have much luck so far with my attempts to work around this: 1. Trying to change the signature of the c() generic: > setGeneric("c", signature="...") Error in setGeneric("c", signature = "...") : 'c' is a primitive function; methods can be defined, but the generic function is implicit, and cannot be changed. 2. Trying to dispatch on "missing" or "ANY": > setMethod("c", "missing", function(x, ..., recursive=FALSE) "YES!") Error in setMethod("c", "missing", function(x, ..., recursive = FALSE) "YES!") : the method for function 'c' and signature x="missing" is sealed and cannot be re-defined > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!") Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE) "YES!") : the method for function 'c' and signature x="ANY" is sealed and cannot be re-defined With old versions of R dispatch on ... was not possible i.e. ... was not allowed to be in the signature of the generic. This was changed in recent versions of R and we're already using this new feature for a few S4 generics defined in BiocGenerics e.g. for cbind() and rbind(): > library(BiocGenerics) > rbind standardGeneric for "rbind" defined from package "BiocGenerics" function (..., deparse.level = 1) standardGeneric("rbind") <environment: 0x29b96b0=""> Methods may be defined for arguments: ... Use showMethods("rbind") for currently available ones. And dispatch works as expected, with or without named arguments: > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23)) DataFrame with 6 rows and 2 columns X Y <integer> <integer> 1 1 11 2 2 12 3 3 13 4 1 21 5 2 22 6 3 23 > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23)) DataFrame with 6 rows and 2 columns X Y <integer> <integer> 1 1 11 2 2 12 3 3 13 4 1 21 5 2 22 6 3 23 So I wonder if the weird behavior of c() is still justified. Comments/suggestions to address this are welcome. The issue is that (unlike 'rbind') 'c' is a primitive and dispatch for primitives is hard-coded in C. C-level dispatch is a simplified variant of the R implementation, so I'm guessing it does not work with "...". Btw, you can get a peak at the 'c' generic with: > getGeneric("c") standardGeneric for "c" defined from package "base" function (x, ..., recursive = FALSE) standardGeneric("c", .Primitive("c")) <bytecode: 0x382af20=""> <environment: 0x34d6878=""> Methods may be defined for arguments: x, recursive Use showMethods("c") for currently available ones. Michael Thanks, H. On 11/30/2012 11:56 AM, Cook, Malcolm wrote: Hi, The following shows that do.call of c on a list of IRangesList returns "list" only when the list is named. library(IRanges) example(IRangesList) class(x) [1] "CompressedIRangesList" attr(,"package") [1] "IRanges" class(do.call(c,list(x1=x,x2=x))) [1] "list" I am confused this. I would not expect the fact that the list is named to have any impact on the result. But, look, omitting the list names the class is now an IRangesList class(do.call(c,list(x,x))) [1] "CompressedIRangesList" attr(,"package") [1] "IRanges" class(c(x,x)) [1] "CompressedIRangesList" attr(,"package") [1] "IRanges" A 'workaround' is to unname the list, as demonstrated: class(do.call(c,unname(list(x1=x,x2=x)))) [1] "CompressedIRangesList" attr(,"package") [1] "IRanges" But, why does having a 'names' attribute effect the behavior of do.calling c so much as to change the class returned? Thanks for your help/education..... Malcolm Cook Computational Biology - Stowers Institute for Medical Research sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] IRanges_1.16.4 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.20.3 BSgenome_1.26.1 Biobase_2.18.0 Biostrings_2.26.2 DBI_0.2-5 GenomicFeatures_1.10.1 GenomicRanges_1.10.5 RCurl_1.95-3 RSQLite_0.11.2 Rsamtools_1.10.2 XML_3.95-0.1 biomaRt_2.14.0 bitops_1.0-4.2 colorspace_1.2-0 data.table_1.8.6 functional_0.1 graph_1.36.1 gtools_2.7.0 parallel_2.15.1 rtracklayer_1.18.1 stats4_2.15.1 tools_2.15.1 zlibbioc_1.4.0 _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> <mailto:bioconductor@r-project.org<mailto:bioconductor@r-project.org>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. [[alternative HTML version deleted]]
On 12/13/2012 12:24 PM, Cook, Malcolm wrote: > Thanks for digging into this, Herve, Michael. > > Herve, I really appreciate your following up on R-devel, such as you > recently did that got mapply ?fixed? to work natively with Bioc?s List > and friends (c.f. http://developer.r-project.org/blosxom.cgi/R-devel/NEWS) > > I don?t think re-defining c as a generic in BioConductor is a good > workaround, for the reasons you mentioned Herve. The issue will just > crop up again with someone else?s non BioC S4 class structure. > > It really is not a BioConductor issue at all. > > If this can also be kicked upstream, that would serve others as well. Glups, I wrote a long answer about the pros and cons of putting stuff in BiocGenerics vs trying to push it into mainstream R. I was about to press the Send button but, before doing so, decided to have a quick look at the source of the methods package (following Michael suggestion) just to confirm my feeling that this would be a tough one, so tough that my previous workaround would suddenly sound much more appealing. I was psychologically and emotionally prepared to have a rough time, but, surprisingly, I didn't. Here is the patch: hpages at thinkpad:~/biocprojects/c_implicit_generic/R-devel$svn diff Index: src/library/methods/R/BasicFunsList.R =================================================================== --- src/library/methods/R/BasicFunsList.R (revision 61310) +++ src/library/methods/R/BasicFunsList.R (working copy) @@ -46,7 +46,7 @@ , "%*%" = function(x, y) standardGeneric("%*%") , "xtfrm" = function(x) standardGeneric("xtfrm") ### these have a different arglist from the primitives -, "c" = function(x, ..., recursive = FALSE) standardGeneric("c") +, "c" = function(..., recursive = FALSE) standardGeneric("c") , "all" = function(x, ..., na.rm = FALSE) standardGeneric("all") , "any" = function(x, ..., na.rm = FALSE) standardGeneric("any") , "sum" = function(x, ..., na.rm = FALSE) standardGeneric("sum") Yes, a 1-liner! I did very little testing but it seems to work fine :-) I'll do more testing before I send this to R-devel. Thanks for the encouragements. H. > > Thoughts? > > ~Malcolm > > *From:*Michael Lawrence [mailto:lawrence.michael at gene.com] > *Sent:* Thursday, December 13, 2012 11:13 AM > *To:* Hervé Pagès > *Cc:* Cook, Malcolm; Michael Lawrence; bioconductor at r-project.org > *Subject:* Re: [BioC] IRanges/List oddity: do.call of c on a list of > IRangesList returns "list" only when the list is named > > Probably better to bring this issue to the attention of John Chambers. > Since he's invited us to start hacking on the methods package, this > might be a good opportunity smooth out some of these rough edges. > > > Michael > > On Wed, Dec 12, 2012 at 6:46 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote: > > Hi Malcolm, > > I'm not sure what the reasons are for the current behaviour > of the c() generic, if they're just historical, or if there > is something deeper, or... > > My view on the "primitive" status of a function is that it should > be an implementation detail, maybe an important one, but a > detail anyway in the sense that being implemented as a .Primitive > or an .Internal or just in plain R should not affect the semantic > of a function. Interestingly there is a short comment in ?.Primitive > suggesting that people's code should not depend on knowing which > functions are primitive because this does change as R evolves. > Unfortunately the reality is very different: there are situations > where you definitely need to know that something is a primitive, > just because argument passing (and consequently method dispatch) > works differently. > > On a more positive note, I found a hack that allows c() to dispatch > on ...: > > setGeneric("c", signature="...", > function(..., recursive=FALSE) > standardGeneric("c"), > useAsDefault=function(..., recursive=FALSE) > base::c(..., recursive=recursive) > ) > > Then: > > setClass("A", representation(aa="integer")) > > setMethod("c", "A", > function(..., recursive=FALSE) > { > args <- list(...) > ans_aa <- unlist(lapply(args, slot, "aa"), use.names=FALSE) > new("A", aa=ans_aa) > } > ) > > > a1 <- new("A", aa=1:3) > > a2 <- new("A", aa=22:25) > > > c(a1, a2) > An object of class "A" > Slot "aa": > [1] 1 2 3 22 23 24 25 > > > c(a1, x=a2) > An object of class "A" > Slot "aa": > [1] 1 2 3 22 23 24 25 > > > c(A=a1, B=a2) > An object of class "A" > Slot "aa": > [1] 1 2 3 22 23 24 25 > > Overriding base::c() with our own c() is pretty invasive though and > I didn't test it enough to guarantee that it doesn't break or slowdown > things. > > Also one important thing to note is that this signature doesn't > allow specific methods to implement extra arguments (like the "c" > method for GenomicRanges does), which kind of makes sense because > the generic function is putting named args that are not named > 'recursive' in ..., and dispatches on them. The same restriction > applies to the cbind() and rbind() generics: > > > setMethod("cbind", "A", function(..., deparse.level=1, > my.toggle=FALSE) NULL) > Creating a generic function for ?cbind? from package ?base? in the > global environment > in method for ?cbind? with signature ?"A"?: no definition for class ?A? > Error in rematchDefinition(definition, fdef, mnames, fnames, signature) : > arguments (deparse.level) after '...' in the generic must appear in > the method, in the same place at the end of the argument list > > So some of the "c" methods would need to be revisited. > > Anyway, would need serious testing before adding this generic to > BiocGenerics. Is it worth it? > > Cheers, > H. > > > > > On 12/03/2012 12:11 PM, Cook, Malcolm wrote: > > Steve, Michael, Herve, all > > As always, ?illuminating?. > > And, as often, frustrating. > > I am clear how unname serves as a workaround for my current purpose. > So, I can proceed. > > But, I remain unclear if this (to me, odd) behavior of base::c is > desirable or justifiable in any sense of the word. Is this informed by > a rational language design, or, as Mike suggests, the result of layering > on of OO design onto a functional base. > > In your opinion, do you/we think this issue should this issue be raised > on R-devel? Or is it a ?waste of time?? > > Thanks for your thoughts/help. > > ~Malcolm > > *From:*Michael Lawrence [mailto:lawrence.michael at gene.com > <mailto:lawrence.michael at="" gene.com="">] > *Sent:* Monday, December 03, 2012 11:31 AM > *To:* Hervé Pagès > *Cc:* Cook, Malcolm; bioconductor at r-project.org > <mailto:bioconductor at="" r-project.org=""> > *Subject:* Re: [BioC] IRanges/List oddity: do.call of c on a list of > > > IRangesList returns "list" only when the list is named > > On Fri, Nov 30, 2012 at 3:28 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> > > <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>> wrote: > > Hi Malcolm, > > The problem you are describing can be reproduced by calling c() > directly on S4 objects. > > * With unnamed arguments: > > > c(IRanges(), IRanges()) > IRanges of length 0 > > > c(Rle(), Rle()) > logical-Rle of length 0 with 0 runs > Lengths: > Values : > > * With named arguments: > > > c(a=IRanges(),b=IRanges()) >$a > IRanges of length 0 > > $b > IRanges of length 0 > > > c(a=Rle(), b=Rle()) >$a > logical-Rle of length 0 with 0 runs > Lengths: > Values : > > $b > logical-Rle of length 0 with 0 runs > Lengths: > Values : > > This statement (found in man page for base::c()) is showing what the > root of the problem is: > > S4 methods: > > This function is S4 generic, but with argument list ?(x, ..., > recursive = FALSE)?. > > Note that, to make things a little bit more confusing, it's not totally > accurate that c() is an S4 generic, at least not on a fresh session: > > > isGeneric("c") > [1] FALSE > > So my understanding of the above statement is that c() will > automatically be turned into an S4 generic at the moment you try > to define an S4 method for it, and, for obscure reasons that I'm not > sure I understand, the argument list used in the definition of this > S4 method must start with 'x'. The consequence of all this is that > dispatch will happen on 'x' so if named arguments are passed with > a name that is not 'x', dispatch will fail and the default method > (which is base::c()) will be called :-b > > This explains why things work as expected in the following situations: > > > c(IRanges(), b=IRanges()) > IRanges of length 0 > > > c(a=IRanges(), IRanges()) > IRanges of length 0 > > > c(a=IRanges(), x=IRanges()) > IRanges of length 0 > > But when all the arguments are named with names != 'x', then nothing > is passed to 'x' and dispatch fails. > > I didn't have much luck so far with my attempts to work around this: > > 1. Trying to change the signature of the c() generic: > > > setGeneric("c", signature="...") > Error in setGeneric("c", signature = "...") : > ?c? is a primitive function; methods can be defined, but > the generic function is implicit, and cannot be changed. > > 2. Trying to dispatch on "missing" or "ANY": > > > setMethod("c", "missing", function(x, ..., recursive=FALSE) > "YES!") > Error in setMethod("c", "missing", function(x, ..., recursive = > FALSE) "YES!") : > the method for function ?c? and signature x="missing" is sealed > and cannot be re-defined > > > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!") > Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE) > "YES!") : > the method for function ?c? and signature x="ANY" is sealed and > cannot be re-defined > > With old versions of R dispatch on ... was not possible i.e. ... was not > allowed to be in the signature of the generic. This was changed in > recent versions of R and we're already using this new feature for a > few S4 generics defined in BiocGenerics e.g. for cbind() and rbind(): > > > library(BiocGenerics) > > rbind > standardGeneric for "rbind" defined from package "BiocGenerics" > > function (..., deparse.level = 1) > standardGeneric("rbind") > <environment: 0x29b96b0=""> > Methods may be defined for arguments: ... > Use showMethods("rbind") for currently available ones. > > And dispatch works as expected, with or without named arguments: > > > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23)) > DataFrame with 6 rows and 2 columns > X Y > <integer> <integer> > 1 1 11 > 2 2 12 > 3 3 13 > 4 1 21 > 5 2 22 > 6 3 23 > > > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23)) > DataFrame with 6 rows and 2 columns > X Y > <integer> <integer> > 1 1 11 > 2 2 12 > 3 3 13 > 4 1 21 > 5 2 22 > 6 3 23 > > So I wonder if the weird behavior of c() is still justified. > > Comments/suggestions to address this are welcome. > > > > The issue is that (unlike 'rbind') 'c' is a primitive and dispatch for > primitives is hard-coded in C. C-level dispatch is a simplified variant > of the R implementation, so I'm guessing it does not work with "...". > > Btw, you can get a peak at the 'c' generic with: > > getGeneric("c") > standardGeneric for "c" defined from package "base" > > function (x, ..., recursive = FALSE) > standardGeneric("c", .Primitive("c")) > <bytecode: 0x382af20=""> > <environment: 0x34d6878=""> > Methods may be defined for arguments: x, recursive > Use showMethods("c") for currently available ones. > > Michael > > Thanks, > H. > > > > > On 11/30/2012 11:56 AM, Cook, Malcolm wrote: > > Hi, > > The following shows that do.call of c on a list of IRangesList > returns "list" only when the list is named. > > library(IRanges) > example(IRangesList) > class(x) > > [1] "CompressedIRangesList" > attr(,"package") > [1] "IRanges" > > class(do.call(c,list(x1=x,x2=x))) > > [1] "list" > > I am confused this. > > I would not expect the fact that the list is named to have any > impact on the result. > > But, look, omitting the list names the class is now an IRangesList > > class(do.call(c,list(x,x))) > > [1] "CompressedIRangesList" > attr(,"package") > [1] "IRanges" > > class(c(x,x)) > > [1] "CompressedIRangesList" > attr(,"package") > [1] "IRanges" > > A 'workaround' is to unname the list, as demonstrated: > > class(do.call(c,unname(list(x1=x,x2=x)))) > > [1] "CompressedIRangesList" > attr(,"package") > [1] "IRanges" > > But, why does having a 'names' attribute effect the behavior of > do.calling c so much as to change the class returned? > > > Thanks for your help/education..... > > Malcolm Cook > Computational Biology - Stowers Institute for Medical Research > > sessionInfo() > > R version 2.15.1 (2012-06-22) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] IRanges_1.16.4 BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.20.3 BSgenome_1.26.1 Biobase_2.18.0 > Biostrings_2.26.2 DBI_0.2-5 > GenomicFeatures_1.10.1 GenomicRanges_1.10.5 RCurl_1.95-3 > RSQLite_0.11.2 Rsamtools_1.10.2 XML_3.95-0.1 > biomaRt_2.14.0 bitops_1.0-4.2 colorspace_1.2-0 > data.table_1.8.6 functional_0.1 graph_1.36.1 > gtools_2.7.0 parallel_2.15.1 > rtracklayer_1.18.1 stats4_2.15.1 tools_2.15.1 > zlibbioc_1.4.0 > > > _______________________________________________ > Bioconductor mailing list > > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > <mailto:bioconductor at="" r-project.org="" <mailto:bioconductor="" at="" r-project.org="">> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">> > Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > <tel:%28206%29%20667-5791> > Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > <tel:%28206%29%20667-1319> > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > <mailto:bioconductor at="" r-project.org="" <mailto:bioconductor="" at="" r-project.org="">> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. -- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

ADD REPLY
0
Entering edit mode Beaucoups woohoos and mucho kudos to you, ;) ~Malcolm .-----Original Message----- .From: Hervé Pagès [mailto:hpages at fhcrc.org] .Sent: Thursday, December 13, 2012 6:15 PM .To: Cook, Malcolm .Cc: 'Michael Lawrence'; 'bioconductor at r-project.org' .Subject: Re: [BioC] IRanges/List oddity: do.call of c on a list of IRangesList returns "list" only when the list is named . .On 12/13/2012 12:24 PM, Cook, Malcolm wrote: .> Thanks for digging into this, Herve, Michael. .> .> Herve, I really appreciate your following up on R-devel, such as you .> recently did that got mapply 'fixed' to work natively with Bioc's List .> and friends (c.f. http://developer.r-project.org/blosxom.cgi/R-devel/NEWS) .> .> I don't think re-defining c as a generic in BioConductor is a good .> workaround, for the reasons you mentioned Herve. The issue will just .> crop up again with someone else's non BioC S4 class structure. .> .> It really is not a BioConductor issue at all. .> .> If this can also be kicked upstream, that would serve others as well. . .Glups, I wrote a long answer about the pros and cons of putting stuff .in BiocGenerics vs trying to push it into mainstream R. I was about to .press the Send button but, before doing so, decided to have a quick .look at the source of the methods package (following Michael suggestion) .just to confirm my feeling that this would be a tough one, so tough .that my previous workaround would suddenly sound much more appealing. .I was psychologically and emotionally prepared to have a rough time, .but, surprisingly, I didn't. Here is the patch: . .hpages at thinkpad:~/biocprojects/c_implicit_generic/R-devel$ svn diff .Index: src/library/methods/R/BasicFunsList.R .=================================================================== .--- src/library/methods/R/BasicFunsList.R (revision 61310) .+++ src/library/methods/R/BasicFunsList.R (working copy) .@@ -46,7 +46,7 @@ . , "%*%" = function(x, y) standardGeneric("%*%") . , "xtfrm" = function(x) standardGeneric("xtfrm") . ### these have a different arglist from the primitives .-, "c" = function(x, ..., recursive = FALSE) standardGeneric("c") .+, "c" = function(..., recursive = FALSE) standardGeneric("c") . , "all" = function(x, ..., na.rm = FALSE) standardGeneric("all") . , "any" = function(x, ..., na.rm = FALSE) standardGeneric("any") . , "sum" = function(x, ..., na.rm = FALSE) standardGeneric("sum") . .Yes, a 1-liner! I did very little testing but it seems to work fine :-) . .I'll do more testing before I send this to R-devel. Thanks for the .encouragements. . .H. . .> .> Thoughts? .> .> ~Malcolm .> .> *From:*Michael Lawrence [mailto:lawrence.michael at gene.com] .> *Sent:* Thursday, December 13, 2012 11:13 AM .> *To:* Hervé Pagès .> *Cc:* Cook, Malcolm; Michael Lawrence; bioconductor at r-project.org .> *Subject:* Re: [BioC] IRanges/List oddity: do.call of c on a list of .> IRangesList returns "list" only when the list is named .> .> Probably better to bring this issue to the attention of John Chambers. .> Since he's invited us to start hacking on the methods package, this .> might be a good opportunity smooth out some of these rough edges. .> .> .> Michael .> .> On Wed, Dec 12, 2012 at 6:46 PM, Hervé Pagès <hpages at="" fhcrc.org="" .=""> <mailto:hpages at="" fhcrc.org="">> wrote: .> .> Hi Malcolm, .> .> I'm not sure what the reasons are for the current behaviour .> of the c() generic, if they're just historical, or if there .> is something deeper, or... .> .> My view on the "primitive" status of a function is that it should .> be an implementation detail, maybe an important one, but a .> detail anyway in the sense that being implemented as a .Primitive .> or an .Internal or just in plain R should not affect the semantic .> of a function. Interestingly there is a short comment in ?.Primitive .> suggesting that people's code should not depend on knowing which .> functions are primitive because this does change as R evolves. .> Unfortunately the reality is very different: there are situations .> where you definitely need to know that something is a primitive, .> just because argument passing (and consequently method dispatch) .> works differently. .> .> On a more positive note, I found a hack that allows c() to dispatch .> on ...: .> .> setGeneric("c", signature="...", .> function(..., recursive=FALSE) .> standardGeneric("c"), .> useAsDefault=function(..., recursive=FALSE) .> base::c(..., recursive=recursive) .> ) .> .> Then: .> .> setClass("A", representation(aa="integer")) .> .> setMethod("c", "A", .> function(..., recursive=FALSE) .> { .> args <- list(...) .> ans_aa <- unlist(lapply(args, slot, "aa"), use.names=FALSE) .> new("A", aa=ans_aa) .> } .> ) .> .> > a1 <- new("A", aa=1:3) .> > a2 <- new("A", aa=22:25) .> .> > c(a1, a2) .> An object of class "A" .> Slot "aa": .> [1] 1 2 3 22 23 24 25 .> .> > c(a1, x=a2) .> An object of class "A" .> Slot "aa": .> [1] 1 2 3 22 23 24 25 .> .> > c(A=a1, B=a2) .> An object of class "A" .> Slot "aa": .> [1] 1 2 3 22 23 24 25 .> .> Overriding base::c() with our own c() is pretty invasive though and .> I didn't test it enough to guarantee that it doesn't break or slowdown .> things. .> .> Also one important thing to note is that this signature doesn't .> allow specific methods to implement extra arguments (like the "c" .> method for GenomicRanges does), which kind of makes sense because .> the generic function is putting named args that are not named .> 'recursive' in ..., and dispatches on them. The same restriction .> applies to the cbind() and rbind() generics: .> .> > setMethod("cbind", "A", function(..., deparse.level=1, .> my.toggle=FALSE) NULL) .> Creating a generic function for 'cbind' from package 'base' in the .> global environment .> in method for 'cbind' with signature '"A"': no definition for class "A" .> Error in rematchDefinition(definition, fdef, mnames, fnames, signature) : .> arguments (deparse.level) after '...' in the generic must appear in .> the method, in the same place at the end of the argument list .> .> So some of the "c" methods would need to be revisited. .> .> Anyway, would need serious testing before adding this generic to .> BiocGenerics. Is it worth it? .> .> Cheers, .> H. .> .> .> .> .> On 12/03/2012 12:11 PM, Cook, Malcolm wrote: .> .> Steve, Michael, Herve, all .> .> As always, "illuminating". .> .> And, as often, frustrating. .> .> I am clear how unname serves as a workaround for my current purpose. .> So, I can proceed. .> .> But, I remain unclear if this (to me, odd) behavior of base::c is .> desirable or justifiable in any sense of the word. Is this informed by .> a rational language design, or, as Mike suggests, the result of layering .> on of OO design onto a functional base. .> .> In your opinion, do you/we think this issue should this issue be raised .> on R-devel? Or is it a "waste of time"? .> .> Thanks for your thoughts/help. .> .> ~Malcolm .> .> *From:*Michael Lawrence [mailto:lawrence.michael at gene.com .> <mailto:lawrence.michael at="" gene.com="">] .> *Sent:* Monday, December 03, 2012 11:31 AM .> *To:* Hervé Pagès .> *Cc:* Cook, Malcolm; bioconductor at r-project.org .> <mailto:bioconductor at="" r-project.org=""> .> *Subject:* Re: [BioC] IRanges/List oddity: do.call of c on a list of .> .> .> IRangesList returns "list" only when the list is named .> .> On Fri, Nov 30, 2012 at 3:28 PM, Hervé Pagès <hpages at="" fhcrc.org="" .=""> <mailto:hpages at="" fhcrc.org=""> .> .> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>> wrote: .> .> Hi Malcolm, .> .> The problem you are describing can be reproduced by calling c() .> directly on S4 objects. .> .> * With unnamed arguments: .> .> > c(IRanges(), IRanges()) .> IRanges of length 0 .> .> > c(Rle(), Rle()) .> logical-Rle of length 0 with 0 runs .> Lengths: .> Values : .> .> * With named arguments: .> .> > c(a=IRanges(),b=IRanges()) .> $a .> IRanges of length 0 .> .>$b .> IRanges of length 0 .> .> > c(a=Rle(), b=Rle()) .> $a .> logical-Rle of length 0 with 0 runs .> Lengths: .> Values : .> .>$b .> logical-Rle of length 0 with 0 runs .> Lengths: .> Values : .> .> This statement (found in man page for base::c()) is showing what the .> root of the problem is: .> .> S4 methods: .> .> This function is S4 generic, but with argument list '(x, ..., .> recursive = FALSE)'. .> .> Note that, to make things a little bit more confusing, it's not totally .> accurate that c() is an S4 generic, at least not on a fresh session: .> .> > isGeneric("c") .> [1] FALSE .> .> So my understanding of the above statement is that c() will .> automatically be turned into an S4 generic at the moment you try .> to define an S4 method for it, and, for obscure reasons that I'm not .> sure I understand, the argument list used in the definition of this .> S4 method must start with 'x'. The consequence of all this is that .> dispatch will happen on 'x' so if named arguments are passed with .> a name that is not 'x', dispatch will fail and the default method .> (which is base::c()) will be called :-b .> .> This explains why things work as expected in the following situations: .> .> > c(IRanges(), b=IRanges()) .> IRanges of length 0 .> .> > c(a=IRanges(), IRanges()) .> IRanges of length 0 .> .> > c(a=IRanges(), x=IRanges()) .> IRanges of length 0 .> .> But when all the arguments are named with names != 'x', then nothing .> is passed to 'x' and dispatch fails. .> .> I didn't have much luck so far with my attempts to work around this: .> .> 1. Trying to change the signature of the c() generic: .> .> > setGeneric("c", signature="...") .> Error in setGeneric("c", signature = "...") : .> 'c' is a primitive function; methods can be defined, but .> the generic function is implicit, and cannot be changed. .> .> 2. Trying to dispatch on "missing" or "ANY": .> .> > setMethod("c", "missing", function(x, ..., recursive=FALSE) .> "YES!") .> Error in setMethod("c", "missing", function(x, ..., recursive = .> FALSE) "YES!") : .> the method for function 'c' and signature x="missing" is sealed .> and cannot be re-defined .> .> > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!") .> Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE) .> "YES!") : .> the method for function 'c' and signature x="ANY" is sealed and .> cannot be re-defined .> .> With old versions of R dispatch on ... was not possible i.e. ... was not .> allowed to be in the signature of the generic. This was changed in .> recent versions of R and we're already using this new feature for a .> few S4 generics defined in BiocGenerics e.g. for cbind() and rbind(): .> .> > library(BiocGenerics) .> > rbind .> standardGeneric for "rbind" defined from package "BiocGenerics" .> .> function (..., deparse.level = 1) .> standardGeneric("rbind") .> <environment: 0x29b96b0=""> .> Methods may be defined for arguments: ... .> Use showMethods("rbind") for currently available ones. .> .> And dispatch works as expected, with or without named arguments: .> .> > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23)) .> DataFrame with 6 rows and 2 columns .> X Y .> <integer> <integer> .> 1 1 11 .> 2 2 12 .> 3 3 13 .> 4 1 21 .> 5 2 22 .> 6 3 23 .> .> > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23)) .> DataFrame with 6 rows and 2 columns .> X Y .> <integer> <integer> .> 1 1 11 .> 2 2 12 .> 3 3 13 .> 4 1 21 .> 5 2 22 .> 6 3 23 .> .> So I wonder if the weird behavior of c() is still justified. .> .> Comments/suggestions to address this are welcome. .> .> .> .> The issue is that (unlike 'rbind') 'c' is a primitive and dispatch for .> primitives is hard-coded in C. C-level dispatch is a simplified variant .> of the R implementation, so I'm guessing it does not work with "...". .> .> Btw, you can get a peak at the 'c' generic with: .> > getGeneric("c") .> standardGeneric for "c" defined from package "base" .> .> function (x, ..., recursive = FALSE) .> standardGeneric("c", .Primitive("c")) .> <bytecode: 0x382af20=""> .> <environment: 0x34d6878=""> .> Methods may be defined for arguments: x, recursive .> Use showMethods("c") for currently available ones. .> .> Michael .> .> Thanks, .> H. .> .> .> .> .> On 11/30/2012 11:56 AM, Cook, Malcolm wrote: .> .> Hi, .> .> The following shows that do.call of c on a list of IRangesList .> returns "list" only when the list is named. .> .> library(IRanges) .> example(IRangesList) .> class(x) .> .> [1] "CompressedIRangesList" .> attr(,"package") .> [1] "IRanges" .> .> class(do.call(c,list(x1=x,x2=x))) .> .> [1] "list" .> .> I am confused this. .> .> I would not expect the fact that the list is named to have any .> impact on the result. .> .> But, look, omitting the list names the class is now an IRangesList .> .> class(do.call(c,list(x,x))) .> .> [1] "CompressedIRangesList" .> attr(,"package") .> [1] "IRanges" .> .> class(c(x,x)) .> .> [1] "CompressedIRangesList" .> attr(,"package") .> [1] "IRanges" .> .> A 'workaround' is to unname the list, as demonstrated: .> .> class(do.call(c,unname(list(x1=x,x2=x)))) .> .> [1] "CompressedIRangesList" .> attr(,"package") .> [1] "IRanges" .> .> But, why does having a 'names' attribute effect the behavior of .> do.calling c so much as to change the class returned? .> .> .> Thanks for your help/education..... .> .> Malcolm Cook .> Computational Biology - Stowers Institute for Medical Research .> .> sessionInfo() .> .> R version 2.15.1 (2012-06-22) .> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) .> .> locale: .> [1] C .> .> attached base packages: .> [1] stats graphics grDevices utils datasets methods .> base .> .> other attached packages: .> [1] IRanges_1.16.4 BiocGenerics_0.4.0 .> .> loaded via a namespace (and not attached): .> [1] AnnotationDbi_1.20.3 BSgenome_1.26.1 Biobase_2.18.0 .> Biostrings_2.26.2 DBI_0.2-5 .> GenomicFeatures_1.10.1 GenomicRanges_1.10.5 RCurl_1.95-3 .> RSQLite_0.11.2 Rsamtools_1.10.2 XML_3.95-0.1 .> biomaRt_2.14.0 bitops_1.0-4.2 colorspace_1.2-0 .> data.table_1.8.6 functional_0.1 graph_1.36.1 .> gtools_2.7.0 parallel_2.15.1 .> rtracklayer_1.18.1 stats4_2.15.1 tools_2.15.1 .> zlibbioc_1.4.0 .> .> .> _______________________________________________ .> Bioconductor mailing list .> .> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> .> <mailto:bioconductor at="" r-project.org="" <mailto:bioconductor="" at="" r-project.org="">> .> .> .> https://stat.ethz.ch/mailman/listinfo/bioconductor .> Search the archives: .> http://news.gmane.org/gmane.science.biology.informatics.conductor .> .> -- .> Hervé Pagès .> .> Program in Computational Biology .> Division of Public Health Sciences .> Fred Hutchinson Cancer Research Center .> 1100 Fairview Ave. N, M1-B514 .> P.O. Box 19024 .> Seattle, WA 98109-1024 .> .> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> .> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">> .> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> .> <tel:%28206%29%20667-5791> .> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> .> <tel:%28206%29%20667-1319> .> .> .> .> _______________________________________________ .> Bioconductor mailing list .> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> .> <mailto:bioconductor at="" r-project.org="" <mailto:bioconductor="" at="" r-project.org="">> .> .> .> https://stat.ethz.ch/mailman/listinfo/bioconductor .> Search the archives: .> http://news.gmane.org/gmane.science.biology.informatics.conductor .> .> .> -- .> Hervé Pagès .> .> Program in Computational Biology .> Division of Public Health Sciences .> Fred Hutchinson Cancer Research Center .> 1100 Fairview Ave. N, M1-B514 .> P.O. Box 19024 .> Seattle, WA 98109-1024 .> .> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> .> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> .> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> .> . .-- .Hervé Pagès . .Program in Computational Biology .Division of Public Health Sciences .Fred Hutchinson Cancer Research Center .1100 Fairview Ave. N, M1-B514 .P.O. Box 19024 .Seattle, WA 98109-1024 . .E-mail: hpages at fhcrc.org .Phone: (206) 667-5791 .Fax: (206) 667-1319