Did the behavior of as.vector(Rle(some.factor)) change on purpose?
1
0
Entering edit mode
@steve-lianoglou-2771
Last seen 19 months ago
United States
Hi all, It looks as if the as.vector call to a run length encoded factor turns it to a vector of characters. Did this happen on accident, or was it a deliberate design decision? Previously: R-2.12, IRanges_1.7.19, GenomicRanges_1.1.20 (A factor of length one is returned): R> a <- Rle(strand(c('+', '-', '+', '+', '-'))) R> as.vector(a[1]) [1] + Levels: + - * ============================= Now: R-2.12, IRanges_1.7.31, GenomicRanges_1.1.20 (The factor is converted to a character) R> a <- Rle(strand(c('+', '-', '+', '+', '-'))) R> as.vector(a[1]) [1] "+" It seems like it would do what is expected (by me :-) if the `getMethod('as.vector', c("Rle", "missing"))` was changed from: function (x, mode = "any") rep.int(as.vector(runValue(x)), runLength(x)) To: function (x, mode = "any") rep.int(runValue(x), runLength(x)) but, upon further inspection, it seems like this was how it was defined previously anyway, so ... I guess something motivated this change? The complete sessionInfo for my last (buggy(?)) case is: R version 2.12.0 Under development (unstable) (2010-07-07 r52477) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=C [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicRanges_1.1.20 IRanges_1.7.31 loaded via a namespace (and not attached): [1] tools_2.12.0 Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Cancer Cancer • 844 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 10 weeks ago
United States
On 08/31/2010 07:15 AM, Steve Lianoglou wrote: > Hi all, > > It looks as if the as.vector call to a run length encoded factor turns > it to a vector of characters. > > Did this happen on accident, or was it a deliberate design decision? Bug fix > x = factor(letters) > as.vector(x) [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" [20] "t" "u" "v" "w" "x" "y" "z" > as.factor(x) [1] a b c d e f g h i j k l m n o p q r s t u v w x y z Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z So > Rle(factor(letters)) 'factor' Rle of length 26 with 26 runs Lengths: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Values : a b c d e f g h i j k l m n o p q r s t u v w x y z Levels(26): a b c d e f g h i j k l m n o p q r s t u v w x y z > as.vector(Rle(factor(letters))) [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" [20] "t" "u" "v" "w" "x" "y" "z" > as.factor(Rle(factor(letters))) [1] a b c d e f g h i j k l m n o p q r s t u v w x y z Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z There might be edge cases where our own code has not caught up with the fix; please let us know... > packageDescription('IRanges')$Version [1] "1.7.32" Martin > > Previously: > > R-2.12, IRanges_1.7.19, GenomicRanges_1.1.20 > (A factor of length one is returned): > > R> a <- Rle(strand(c('+', '-', '+', '+', '-'))) > R> as.vector(a[1]) > [1] + > Levels: + - * > > ============================= > > Now: > R-2.12, IRanges_1.7.31, GenomicRanges_1.1.20 (The factor is converted > to a character) > > R> a <- Rle(strand(c('+', '-', '+', '+', '-'))) > R> as.vector(a[1]) > [1] "+" > > It seems like it would do what is expected (by me :-) if the > `getMethod('as.vector', c("Rle", "missing"))` was changed from: > > function (x, mode = "any") > rep.int(as.vector(runValue(x)), runLength(x)) > > To: > > function (x, mode = "any") > rep.int(runValue(x), runLength(x)) > > but, upon further inspection, it seems like this was how it was > defined previously anyway, so ... I guess something motivated this > change? > > The complete sessionInfo for my last (buggy(?)) case is: > > R version 2.12.0 Under development (unstable) (2010-07-07 r52477) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=C > [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C > LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicRanges_1.1.20 IRanges_1.7.31 > > loaded via a namespace (and not attached): > [1] tools_2.12.0 > > Thanks, > -steve > > -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Hi, On Tue, Aug 31, 2010 at 11:37 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > On 08/31/2010 07:15 AM, Steve Lianoglou wrote: >> Hi all, >> >> It looks as if the as.vector call to a run length encoded factor turns >> it to a vector of characters. >> >> Did this happen on accident, or was it a deliberate design decision? > > Bug fix > >> x = factor(letters) >> as.vector(x) > ?[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" > "r" "s" > [20] "t" "u" "v" "w" "x" "y" "z" >> as.factor(x) > ?[1] a b c d e f g h i j k l m n o p q r s t u v w x y z > Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z > > So > >> Rle(factor(letters)) > 'factor' Rle of length 26 with 26 runs > ?Lengths: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > ?Values : a b c d e f g h i j k l m n o p q r s t u v w x y z > Levels(26): a b c d e f g h i j k l m n o p q r s t u v w x y z >> as.vector(Rle(factor(letters))) > ?[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" > "r" "s" > [20] "t" "u" "v" "w" "x" "y" "z" >> as.factor(Rle(factor(letters))) > ?[1] a b c d e f g h i j k l m n o p q r s t u v w x y z > Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z > > There might be edge cases where our own code has not caught up with the > fix; please let us know... Interesting. Although I guess it is now following the "normal R" convention(?), this seems like it's more surprising (in this case) then natural. Is there some other function, then, that is something more like an exact inverse of Rle? I guess I'd like to un-encode a vector and get returned a vector of the type that is being encoding, w/o having to keep track of details ... sorry, maybe I'm lazy :-) Honestly, though, IMHO, there would be some design/use advantage to have such a function ... and having that function named "as.vector" ... but I digress :-) -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY

Login before adding your answer.

Traffic: 482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6