append on DNAStringSet produces an empty DNAString as last element
1
0
Entering edit mode
@philip-kensche-4208
Last seen 10.4 years ago
Dear Martin, > On 08/10/2010 03:01 AM, Philip Kensche wrote: > > Hi, > > > > I noticed that following: > > > >> append(DNAStringSet(), list(DNAString("aaaa"), DNAString("catc"))) > > > > [[1]] > > 4-letter "DNAString" instance > > seq: AAAA > > > > [[3A2]] > > 4-letter "DNAString" instance > > seq: CATC > > > > [[3]] > > A DNAStringSet instance of length 0 > > > > I guess, the last element shouldn't be there -- or not? > this has to do with what base::append does when the first argument is > zero length, > > base::append > function (x, values, after = length(x)) > { > lengx <- length(x) > if (!after) > c(values, x) > else if (after >= lengx) > c(x, values) > else c(x[1L:after], values, x[(after + 1L):lengx]) > } > <environment: namespace:base=""> > which leads to some inconsistent behavior, e.g., dropping zero- length > atomic vectors but not other data structures > > append(numeric(), list(1)) > [[1]] > [1] 1 > > append(new.env(), list(1)) > [[1]] > [1] 1 > [[2]] > <environment: 0x461a508=""> > I'm not sure what the reason for this behavior is; I might have expected > list(numeric(), 1) in the first case, list(new.env(), 1) in the second. If I see that right, it is a problem of the append function from package base, i.e. of an R core package. Actually, I noticed that function base::append called on c("DNAStringSet", "list") returns a list. I would expect it to return an extended DNAStringSet. Thanks, Martin! Philip P.S.: > is that '[[3A2]]' in your output correct? It suggests some kind of > memory corruption (in R?) but I can't reproduce it. It's not because of R. It must have happened in the editor -- so nothing to worry about :-) > Martin > > > > > > Regards, > > > > Philip > > > > > > > > > > P.S.: > > > > > >> sessionInfo() > > R version 2.11.1 (2010-05-31) > > x86_64-pc-linux-gnu > > > > locale: > > [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 > > [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 > > [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] GenomicRanges_1.0.7 Biostrings_2.16.9 IRanges_1.6.6 > > > > loaded via a namespace (and not attached): > > [1] Biobase_2.8.0 BSgenome_1.16.2 > > > > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 -- | Philip Kensche <pkensche at="" cmbi.ru.nl=""> | http://www.cmbi.ru.nl/~pkensche | | Center for Molecular and Biomolecular Informatics | http://www2.cmbi.ru.nl | | phone +31 (0)24 36 19693 | fax +31 (0)24 36 19395
Cancer Cancer • 1.6k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Hi Philip, On 08/11/2010 04:25 AM, Philip Kensche wrote: > Dear Martin, > >> On 08/10/2010 03:01 AM, Philip Kensche wrote: >>> Hi, >>> >>> I noticed that following: >>> >>>> append(DNAStringSet(), list(DNAString("aaaa"), DNAString("catc"))) >>> >>> [[1]] >>> 4-letter "DNAString" instance >>> seq: AAAA >>> >>> [[3A2]] >>> 4-letter "DNAString" instance >>> seq: CATC >>> >>> [[3]] >>> A DNAStringSet instance of length 0 >>> >>> I guess, the last element shouldn't be there -- or not? > >> this has to do with what base::append does when the first argument is >> zero length, > >>> base::append >> function (x, values, after = length(x)) >> { >> lengx<- length(x) >> if (!after) >> c(values, x) >> else if (after>= lengx) >> c(x, values) >> else c(x[1L:after], values, x[(after + 1L):lengx]) >> } >> <environment: namespace:base=""> > >> which leads to some inconsistent behavior, e.g., dropping zero- length >> atomic vectors but not other data structures > >>> append(numeric(), list(1)) >> [[1]] >> [1] 1 > >>> append(new.env(), list(1)) >> [[1]] >> [1] 1 > >> [[2]] >> <environment: 0x461a508=""> > >> I'm not sure what the reason for this behavior is; I might have expected >> list(numeric(), 1) in the first case, list(new.env(), 1) in the second. > > If I see that right, it is a problem of the append function from package base, i.e. of an R core package. > > Actually, I noticed that function base::append called on c("DNAStringSet", "list") returns a list. I would expect it to return an extended DNAStringSet. Combining objects of mixed types will most of the time lead to surprises. Things are much more predictable when the objects to combine have the same type. For example, with 2 DNAStringSet objects: > append(DNAStringSet(), DNAStringSet(c("AA", "TGGG"))) A DNAStringSet instance of length 2 width seq [1] 2 AA [2] 4 TGGG > append(DNAStringSet(c("AA", "TGGG")), DNAStringSet()) A DNAStringSet instance of length 2 width seq [1] 2 AA [2] 4 TGGG But there is no obvious/natural thing to do when combining a DNAStringSet object with a list. I would argue that in that case append() should raise an error but that's not how R tends to handle things in general. Cheers, H. > > Thanks, Martin! > > Philip > > P.S.: > >> is that '[[3A2]]' in your output correct? It suggests some kind of >> memory corruption (in R?) but I can't reproduce it. > > It's not because of R. It must have happened in the editor -- so nothing to worry about :-) > > >> Martin > >>> >>> >>> Regards, >>> >>> Philip >>> >>> >>> >>> >>> P.S.: >>> >>> >>>> sessionInfo() >>> R version 2.11.1 (2010-05-31) >>> x86_64-pc-linux-gnu >>> >>> locale: >>> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 >>> [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 >>> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] GenomicRanges_1.0.7 Biostrings_2.16.9 IRanges_1.6.6 >>> >>> loaded via a namespace (and not attached): >>> [1] Biobase_2.8.0 BSgenome_1.16.2 >>> >>> > > >> -- >> Martin Morgan >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 > >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 > > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT

Login before adding your answer.

Traffic: 508 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6