Biostrings bug?
1
0
Entering edit mode
@arnemuellernovartiscom-2205
Last seen 8.5 years ago
Switzerland
Dear All, I came across the following error in DNAStringSet from the Biostrings package: > myseq = "CTATGTGTGAGGGCAGCAACCAGAACTGTCTGCCCTGACTTCGCTCAGGATGCTGTGAACATGTGGCTC AGATGGTGCTAGGCATTTTCCTCTAGAGTCAGAAACGTGGACAGAGAGTCATCTCCTCTGGCTTCCCAGG CATGTCTGCCACTCTGAAGGTCTGAAGGTCTGGGTCTCCCTCCCATGGGATTTGAGTGCAGAGAGCTGTG TGACTGGGTCCCTTCAGATCCAGGTGGTGTCTGGACTGTAGCGTTGAGTGCCCTATCTTCCTGGTCTCAG AGCACCTATACAGTTTCCTCTTGGGCCAGGGATGTGGGCAGTGGTGGGCTGTACTGGAAGTCTCTCCTGT CCTGCAGTCTCAGGAGTGGCCACCTGTCTGGGTGGTGAGCTCTCTCTCCCATGGGGTTAGGGAGCAGGGA GGTTTTGCAAGATTCAGATTTAAGGTCACATTTTATCATCATAATGGAGGACATTAGGAAGGTCAGAAAT AACTCCCTTAAGGAAATACTTGACAACACAAGCAAACTAGTAGAAATCTTTTTAAAAGGAAACACAAAAG TATTTTAAAGAATTACAGCAAACCACAACCAAATAGGAGAGGAAATTGAACAAAATCATCCAGGAGTTAA ATATGGAAATAGAAACAATGAAGAGAGCACACAGCGAGACAACCCTGGAGATAGAAAATCTAAGGAAGAG ATCAGGAGTCATAGATGCAAGCATCACTGACGGACTACATGAGATAGAAGAGAGAATTTTGGGAGCAGAA GATATCATAGAAAACATTGACACAACCTTCAAAGAGAACGTAAATAGGAAAAAGCTCCTAGCCCTAAACA TGCAGGAAATCAGGAAACAAATCAAAGATCAAACCTAAATATATCAGGTATAGAAGAGAGTGAAGACTCC CAACATAAAGGGATGGTAAATATCTTCAACAATATAAACAATATAAAGGAAAACATCCCTAACCAAAAGA AATAAATGT! CCATAAATAGACATGAAGCCTGCAGAATTCCAAATAGAATGGACCAGAAAATAAATTCCTCCTGTCACA TAATAGTCAAAACACCAAATGCACAAAACAAAGAATGAATATTAAAAGCATTAACGGATAAAGGTCAAGT ACATTTAAAGGCAGACATGTCAGAATTACACCAGAATTCTTACCATGGACTATGAAAGCCAGAAGACAGA TGT" > mysDNA = DNAStringSet(myseq) # ok! > myseq = rep(myseq, 2000000) > myseq.bs = DNAStringSet(myseq) Error in .Call("new_SharedRaw_from_STRSXP", x, start(solved_SEW), width(solved_SEW), : negative length vectors are not allowed Enter a frame number, or 0 to exit 1: DNAStringSet(myseq) 2: XStringSet("DNA", x, start = start, end = end, width = width, use.names = u 3: XStringSet("DNA", x, start = start, end = end, width = width, use.names = u 4: .charToXStringSet(basetype, x, start, end, width, use.names) 5: .charToXString(basetype, x, solved_SEW) Selection: 0 > Strangely the following works ...: myseq.bs = c(DNAStringSet(myseq[1:1000000]), DNAStringSet(myseq[1000001:2000000])) Somehow there must be an overflow ... . Here's some more info on my system: > sessionInfo() R version 2.11.1 Patched (2010-06-20 r52342) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] BSgenome.Rnorvegicus.UCSC.rn4_1.3.16 BSgenome_1.16.4 [3] Biostrings_2.16.5 GenomicRanges_1.0.3 [5] IRanges_1.6.11 loaded via a namespace (and not attached): [1] Biobase_2.8.0 tools_2.11.1 Linux version 2.6.18-92.el5 (brewbuilder@ls20-bc2-13.build.redhat.com) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-41)) #1 SMP Tue Apr 29 13:16:15 EDT 2008 64 Gb memory thanks for your help +kind regards, Arne [[alternative HTML version deleted]]
BSgenome BSgenome BSgenome BSgenome • 1.1k views
ADD COMMENT
0
Entering edit mode
@arnemuellernovartiscom-2205
Last seen 8.5 years ago
Switzerland
Hi, sorry, the sequence in my original posting got screwed during copy/paste, this is the "real" sequence: > CTATGTGTGAGGGCAGCAACCAGAACTGTCTGCCCTGACTTCGCTCAGGATGCTGTGAACATGTGGCTCA GATGGTGCTA GGCATTTTCCTCTAGAGTCAGAAACGTGGACAGAGAGTCATCTCCTCTGGCTTCCCAGGCATGTCTGCCA CTCTGAAGGT CTGAAGGTCTGGGTCTCCCTCCCATGGGATTTGAGTGCAGAGAGCTGTGTGACTGGGTCCCTTCAGATCC AGGTGGTGTC TGGACTGTAGCGTTGAGTGCCCTATCTTCCTGGTCTCAGAGCACCTATACAGTTTCCTCTTGGGCCAGGG ATGTGGGCAG TGGTGGGCTGTACTGGAAGTCTCTCCTGTCCTGCAGTCTCAGGAGTGGCCACCTGTCTGGGTGGTGAGCT CTCTCTCCCA TGGGGTTAGGGAGCAGGGAGGTTTTGCAAGATTCAGATTTAAGGTCACATTTTATCATCATAATGGAGGA CATTAGGAAG GTCAGAAATAACTCCCTTAAGGAAATACTTGACAACACAAGCAAACTAGTAGAAATCTTTTTAAAAGGAA ACACAAAAGT ATTTTAAAGAATTACAGCAAACCACAACCAAATAGGAGAGGAAATTGAACAAAATCATCCAGGAGTTAAA TATGGAAATA GAAACAATGAAGAGAGCACACAGCGAGACAACCCTGGAGATAGAAAATCTAAGGAAGAGATCAGGAGTCA TAGATGCAAG CATCACTGACGGACTACATGAGATAGAAGAGAGAATTTTGGGAGCAGAAGATATCATAGAAAACATTGAC ACAACCTTCA AAGAGAACGTAAATAGGAAAAAGCTCCTAGCCCTAAACATGCAGGAAATCAGGAAACAAATCAAAGATCA AACCTAAATA TATCAGGTATAGAAGAGAGTGAAGACTCCCAACATAAAGGGATGGTAAATATCTTCAACAATATAAACAA TATAAAGGAA AACATCCCTAACCAAAAGAAATAAATGTCCATAAATAGACATGAAGCCTGCAGAATTCCAAATAGAATGG ACCAGAAAAT AAATTCCTCCTGTCACATAATAGTCAAAACACCAAATGCACAAAACAAAGAATGAATATTAAAAGCATTA ACGGATAAAG GTCAAGTACATTTAAAGGCAGACATGTCAGAATTACACCAGAATTCTTACCATGGACTATGAAAGCCAGA AGACAGATGT It doesn't matter which sequence one uses to get the DNAStringSet error, it just has to be long and there have to be many of them, here's a more generic example: > myseq.bs = DNAStringSet(rep(paste(rep("A", 100), collapse=""), 2000)) > myseq.bs = DNAStringSet(rep(paste(rep("A", 100), collapse=""), 2000000)) > myseq.bs = DNAStringSet(rep(paste(rep("A", 1200), collapse=""), 2000)) > myseq.bs = DNAStringSet(rep(paste(rep("A", 1200), collapse=""), 2000000)) Error in .Call("new_SharedRaw_from_STRSXP", x, start(solved_SEW), width(solved_SEW), : negative length vectors are not allowed Arne arne.mueller@novartis.com Sent by: bioconductor-bounces@stat.math.ethz.ch 10/07/2010 05:55 PM To bioconductor@stat.math.ethz.ch cc Subject [BioC] Biostrings bug? Dear All, I came across the following error in DNAStringSet from the Biostrings package: > myseq = "CTATGTGTGAGGGCAGCAACCAGAACTGTCTGCCCTGACTTCGCTCAGGATGCTGTGAACATGTGGCTC AGATGGTGCTAGGCATTTTCCTCTAGAGTCAGAAACGTGGACAGAGAGTCATCTCCTCTGGCTTCCCAGG CATGTCTGCCACTCTGAAGGTCTGAAGGTCTGGGTCTCCCTCCCATGGGATTTGAGTGCAGAGAGCTGTG TGACTGGGTCCCTTCAGATCCAGGTGGTGTCTGGACTGTAGCGTTGAGTGCCCTATCTTCCTGGTCTCAG AGCACCTATACAGTTTCCTCTTGGGCCAGGGATGTGGGCAGTGGTGGGCTGTACTGGAAGTCTCTCCTGT CCTGCAGTCTCAGGAGTGGCCACCTGTCTGGGTGGTGAGCTCTCTCTCCCATGGGGTTAGGGAGCAGGGA GGTTTTGCAAGATTCAGATTTAAGGTCACATTTTATCATCATAATGGAGGACATTAGGAAGGTCAGAAAT AACTCCCTTAAGGAAATACTTGACAACACAAGCAAACTAGTAGAAATCTTTTTAAAAGGAAACACAAAAG TATTTTAAAGAATTACAGCAAACCACAACCAAATAGGAGAGGAAATTGAACAAAATCATCCAGGAGTTAA ATATGGAAATAGAAACAATGAAGAGAGCACACAGCGAGACAACCCTGGAGATAGAAAATCTAAGGAAGAG ATCAGGAGTCATAGATGCAAGCATCACTGACGGACTACATGAGATAGAAGAGAGAATTTTGGGAGCAGAA GATATCATAGAAAACATTGACACAACCTTCAAAGAGAACGTAAATAGGAAAAAGCTCCTAGCCCTAAACA TGCAGGAAATCAGGAAACAAATCAAAGATCAAACCTAAATATATCAGGTATAGAAGAGAGTGAAGACTCC CAACATAAAGGGATGGTAAATATCTTCAACAATATAAACAATATAAAGGAAAACATCCCTAACCAAAAGA AATAAATGT! CCATAAATAGACATGAAGCCTGCAGAATTCCAAATAGAATGGACCAGAAAATAAATTCCTCCTGTCACAT AATAGTCAAAACACCAAATGCACAAAACAAAGAATGAATATTAAAAGCATTAACGGATAAAGGTCAAGTA CATTTAAAGGCAGACATGTCAGAATTACACCAGAATTCTTACCATGGACTATGAAAGCCAGAAGACAGAT GT" > mysDNA = DNAStringSet(myseq) # ok! > myseq = rep(myseq, 2000000) > myseq.bs = DNAStringSet(myseq) Error in .Call("new_SharedRaw_from_STRSXP", x, start(solved_SEW), width(solved_SEW), : negative length vectors are not allowed Enter a frame number, or 0 to exit 1: DNAStringSet(myseq) 2: XStringSet("DNA", x, start = start, end = end, width = width, use.names = u 3: XStringSet("DNA", x, start = start, end = end, width = width, use.names = u 4: .charToXStringSet(basetype, x, start, end, width, use.names) 5: .charToXString(basetype, x, solved_SEW) Selection: 0 > Strangely the following works ...: myseq.bs = c(DNAStringSet(myseq[1:1000000]), DNAStringSet(myseq[1000001:2000000])) Somehow there must be an overflow ... . Here's some more info on my system: > sessionInfo() R version 2.11.1 Patched (2010-06-20 r52342) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] BSgenome.Rnorvegicus.UCSC.rn4_1.3.16 BSgenome_1.16.4 [3] Biostrings_2.16.5 GenomicRanges_1.0.3 [5] IRanges_1.6.11 loaded via a namespace (and not attached): [1] Biobase_2.8.0 tools_2.11.1 Linux version 2.6.18-92.el5 (brewbuilder@ls20-bc2-13.build.redhat.com) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-41)) #1 SMP Tue Apr 29 13:16:15 EDT 2008 64 Gb memory thanks for your help +kind regards, Arne [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Arne, Thanks for catching this. I'm working on a fix. A temporary workaround for now is to generate smaller DNAStringSet objects and combine them together with c(): > myseq.bs1 <- DNAStringSet(rep(paste(rep("A", 1200), collapse=""), 1000000)) > myseq.bs2 <- DNAStringSet(rep(paste(rep("A", 1200), collapse=""), 1000000)) > myseq.bs <- c(myseq.bs1, myseq.bs2) > myseq.bs A DNAStringSet instance of length 2000000 width seq [1] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [2] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [3] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [4] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [5] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [6] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [7] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [8] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [9] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA ... ... ... [1999992] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999993] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999994] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999995] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999996] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999997] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999998] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999999] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [2000000] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA I'll post here again when I've solved the problem. Cheers, H. On 10/07/2010 09:18 AM, arne.mueller at novartis.com wrote: > Hi, > > sorry, the sequence in my original posting got screwed during copy/paste, > this is the "real" sequence: > >> > CTATGTGTGAGGGCAGCAACCAGAACTGTCTGCCCTGACTTCGCTCAGGATGCTGTGAACATGTGGCT CAGATGGTGCTA > GGCATTTTCCTCTAGAGTCAGAAACGTGGACAGAGAGTCATCTCCTCTGGCTTCCCAGGCATGTCTGC CACTCTGAAGGT > CTGAAGGTCTGGGTCTCCCTCCCATGGGATTTGAGTGCAGAGAGCTGTGTGACTGGGTCCCTTCAGAT CCAGGTGGTGTC > TGGACTGTAGCGTTGAGTGCCCTATCTTCCTGGTCTCAGAGCACCTATACAGTTTCCTCTTGGGCCAG GGATGTGGGCAG > TGGTGGGCTGTACTGGAAGTCTCTCCTGTCCTGCAGTCTCAGGAGTGGCCACCTGTCTGGGTGGTGAG CTCTCTCTCCCA > TGGGGTTAGGGAGCAGGGAGGTTTTGCAAGATTCAGATTTAAGGTCACATTTTATCATCATAATGGAG GACATTAGGAAG > GTCAGAAATAACTCCCTTAAGGAAATACTTGACAACACAAGCAAACTAGTAGAAATCTTTTTAAAAGG AAACACAAAAGT > ATTTTAAAGAATTACAGCAAACCACAACCAAATAGGAGAGGAAATTGAACAAAATCATCCAGGAGTTA AATATGGAAATA > GAAACAATGAAGAGAGCACACAGCGAGACAACCCTGGAGATAGAAAATCTAAGGAAGAGATCAGGAGT CATAGATGCAAG > CATCACTGACGGACTACATGAGATAGAAGAGAGAATTTTGGGAGCAGAAGATATCATAGAAAACATTG ACACAACCTTCA > AAGAGAACGTAAATAGGAAAAAGCTCCTAGCCCTAAACATGCAGGAAATCAGGAAACAAATCAAAGAT CAAACCTAAATA > TATCAGGTATAGAAGAGAGTGAAGACTCCCAACATAAAGGGATGGTAAATATCTTCAACAATATAAAC AATATAAAGGAA > AACATCCCTAACCAAAAGAAATAAATGTCCATAAATAGACATGAAGCCTGCAGAATTCCAAATAGAAT GGACCAGAAAAT > AAATTCCTCCTGTCACATAATAGTCAAAACACCAAATGCACAAAACAAAGAATGAATATTAAAAGCAT TAACGGATAAAG > GTCAAGTACATTTAAAGGCAGACATGTCAGAATTACACCAGAATTCTTACCATGGACTATGAAAGCCA GAAGACAGATGT > > It doesn't matter which sequence one uses to get the DNAStringSet error, > it just has to be long and > there have to be many of them, here's a more generic example: > >> myseq.bs = DNAStringSet(rep(paste(rep("A", 100), collapse=""), 2000)) >> myseq.bs = DNAStringSet(rep(paste(rep("A", 100), collapse=""), 2000000)) >> myseq.bs = DNAStringSet(rep(paste(rep("A", 1200), collapse=""), 2000)) >> myseq.bs = DNAStringSet(rep(paste(rep("A", 1200), collapse=""), > 2000000)) > Error in .Call("new_SharedRaw_from_STRSXP", x, start(solved_SEW), > width(solved_SEW), : > negative length vectors are not allowed > > Arne > > > > > > > arne.mueller at novartis.com > Sent by: bioconductor-bounces at stat.math.ethz.ch > 10/07/2010 05:55 PM > > To > bioconductor at stat.math.ethz.ch > cc > > Subject > [BioC] Biostrings bug? > > > > > > > Dear All, > > I came across the following error in DNAStringSet from the Biostrings > package: > >> myseq = > "CTATGTGTGAGGGCAGCAACCAGAACTGTCTGCCCTGACTTCGCTCAGGATGCTGTGAACATGTGGC TCAGATGGTGCTAGGCATTTTCCTCTAGAGTCAGAAACGTGGACAGAGAGTCATCTCCTCTGGCTTCCCA GGCATGTCTGCCACTCTGAAGGTCTGAAGGTCTGGGTCTCCCTCCCATGGGATTTGAGTGCAGAGAGCTG TGTGACTGGGTCCCTTCAGATCCAGGTGGTGTCTGGACTGTAGCGTTGAGTGCCCTATCTTCCTGGTCTC AGAGCACCTATACAGTTTCCTCTTGGGCCAGGGATGTGGGCAGTGGTGGGCTGTACTGGAAGTCTCTCCT GTCCTGCAGTCTCAGGAGTGGCCACCTGTCTGGGTGGTGAGCTCTCTCTCCCATGGGGTTAGGGAGCAGG GAGGTTTTGCAAGATTCAGATTTAAGGTCACATTTTATCATCATAATGGAGGACATTAGGAAGGTCAGAA ATAACTCCCTTAAGGAAATACTTGACAACACAAGCAAACTAGTAGAAATCTTTTTAAAAGGAAACACAAA AGTATTTTAAAGAATTACAGCAAACCACAACCAAATAGGAGAGGAAATTGAACAAAATCATCCAGGAGTT AAATATGGAAATAGAAACAATGAAGAGAGCACACAGCGAGACAACCCTGGAGATAGAAAATCTAAGGAAG AGATCAGGAGTCATAGATGCAAGCATCACTGACGGACTACATGAGATAGAAGAGAGAATTTTGGGAGCAG AAGATATCATAGAAAACATTGACACAACCTTCAAAGAGAACGTAAATAGGAAAAAGCTCCTAGCCCTAAA CATGCAGGAAATCAGGAAACAAATCAAAGATCAAACCTAAATATATCAGGTATAGAAGAGAGTGAAGACT CCCAACATAAAGGGATGGTAAATATCTTCAACAATATAAACAATATAAAGGAAAACATCCCTAACCAAAA GAAATAAATG T! > > CCATAAATAGACATGAAGCCTGCAGAATTCCAAATAGAATGGACCAGAAAATAAATTCCTCCTGTCAC ATAATAGTCAAAACACCAAATGCACAAAACAAAGAATGAATATTAAAAGCATTAACGGATAAAGGTCAAG TACATTTAAAGGCAGACATGTCAGAATTACACCAGAATTCTTACCATGGACTATGAAAGCCAGAAGACAG ATGT" >> mysDNA = DNAStringSet(myseq) # ok! >> myseq = rep(myseq, 2000000) >> myseq.bs = DNAStringSet(myseq) > Error in .Call("new_SharedRaw_from_STRSXP", x, start(solved_SEW), > width(solved_SEW), : > > negative length vectors are not allowed > > Enter a frame number, or 0 to exit > 1: DNAStringSet(myseq) > 2: XStringSet("DNA", x, start = start, end = end, width = width, use.names > > = u > 3: XStringSet("DNA", x, start = start, end = end, width = width, use.names > > = u > 4: .charToXStringSet(basetype, x, start, end, width, use.names) > 5: .charToXString(basetype, x, solved_SEW) > > Selection: 0 >> > > Strangely the following works ...: > > myseq.bs = c(DNAStringSet(myseq[1:1000000]), > DNAStringSet(myseq[1000001:2000000])) > > Somehow there must be an overflow ... . > > Here's some more info on my system: > >> sessionInfo() > R version 2.11.1 Patched (2010-06-20 r52342) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] BSgenome.Rnorvegicus.UCSC.rn4_1.3.16 BSgenome_1.16.4 > [3] Biostrings_2.16.5 GenomicRanges_1.0.3 > [5] IRanges_1.6.11 > > loaded via a namespace (and not attached): > [1] Biobase_2.8.0 tools_2.11.1 > > Linux version 2.6.18-92.el5 (brewbuilder at ls20-bc2-13.build.redhat.com) > (gcc version 4.1.2 20071124 (Red Hat 4.1.2-41)) #1 SMP Tue Apr 29 13:16:15 > > EDT 2008 > > 64 Gb memory > > thanks for your help > +kind regards, > > Arne > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Arne, I completely forgot to check this but I just realize that this has already been addressed in the devel version of Biostrings (which will soon become the new release version). Starting with Biostrings 2.17, a DNAStringSet object can be much bigger: up to 2^31-1 sequences per object and each sequence can itself be up to 2^31-1 letters long (before that the cumulated length of the sequences needed to be <= 2^31-1). So as long as your machine has enough memory (and your OS knows how to make use of that memory), you should be able to create big DNAStringSet objects like this: > myseq.bs = DNAStringSet(rep(paste(rep("A", 1200), collapse=""), 2000000)) > myseq.bs A DNAStringSet instance of length 2000000 width seq [1] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [2] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [3] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [4] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [5] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [6] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [7] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [8] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [9] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA ... ... ... [1999992] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999993] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999994] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999995] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999996] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999997] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999998] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [1999999] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA [2000000] 1200 AAAAAAAAAAAAAAAAAAAAA...AAAAAAAAAAAAAAAAAAAA > sessionInfo() R version 2.12.0 alpha (2010-09-27 r53048) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets [6] methods base other attached packages: [1] Biostrings_2.17.47 IRanges_1.7.39 loaded via a namespace (and not attached): [1] Biobase_2.9.2 Note that Biostrings_2.17 and IRanges_1.7 belong to BioC 2.7, the current development version of Bioconductor (which is about to be released). You need R 2.12 (which is about to be released too) if you want to use BioC 2.7. Just use biocLite() from within R-2.12 to install packages. Cheers, H. On 10/07/2010 09:18 AM, arne.mueller at novartis.com wrote: > Hi, > > sorry, the sequence in my original posting got screwed during copy/paste, > this is the "real" sequence: > >> > CTATGTGTGAGGGCAGCAACCAGAACTGTCTGCCCTGACTTCGCTCAGGATGCTGTGAACATGTGGCT CAGATGGTGCTA > GGCATTTTCCTCTAGAGTCAGAAACGTGGACAGAGAGTCATCTCCTCTGGCTTCCCAGGCATGTCTGC CACTCTGAAGGT > CTGAAGGTCTGGGTCTCCCTCCCATGGGATTTGAGTGCAGAGAGCTGTGTGACTGGGTCCCTTCAGAT CCAGGTGGTGTC > TGGACTGTAGCGTTGAGTGCCCTATCTTCCTGGTCTCAGAGCACCTATACAGTTTCCTCTTGGGCCAG GGATGTGGGCAG > TGGTGGGCTGTACTGGAAGTCTCTCCTGTCCTGCAGTCTCAGGAGTGGCCACCTGTCTGGGTGGTGAG CTCTCTCTCCCA > TGGGGTTAGGGAGCAGGGAGGTTTTGCAAGATTCAGATTTAAGGTCACATTTTATCATCATAATGGAG GACATTAGGAAG > GTCAGAAATAACTCCCTTAAGGAAATACTTGACAACACAAGCAAACTAGTAGAAATCTTTTTAAAAGG AAACACAAAAGT > ATTTTAAAGAATTACAGCAAACCACAACCAAATAGGAGAGGAAATTGAACAAAATCATCCAGGAGTTA AATATGGAAATA > GAAACAATGAAGAGAGCACACAGCGAGACAACCCTGGAGATAGAAAATCTAAGGAAGAGATCAGGAGT CATAGATGCAAG > CATCACTGACGGACTACATGAGATAGAAGAGAGAATTTTGGGAGCAGAAGATATCATAGAAAACATTG ACACAACCTTCA > AAGAGAACGTAAATAGGAAAAAGCTCCTAGCCCTAAACATGCAGGAAATCAGGAAACAAATCAAAGAT CAAACCTAAATA > TATCAGGTATAGAAGAGAGTGAAGACTCCCAACATAAAGGGATGGTAAATATCTTCAACAATATAAAC AATATAAAGGAA > AACATCCCTAACCAAAAGAAATAAATGTCCATAAATAGACATGAAGCCTGCAGAATTCCAAATAGAAT GGACCAGAAAAT > AAATTCCTCCTGTCACATAATAGTCAAAACACCAAATGCACAAAACAAAGAATGAATATTAAAAGCAT TAACGGATAAAG > GTCAAGTACATTTAAAGGCAGACATGTCAGAATTACACCAGAATTCTTACCATGGACTATGAAAGCCA GAAGACAGATGT > > It doesn't matter which sequence one uses to get the DNAStringSet error, > it just has to be long and > there have to be many of them, here's a more generic example: > >> myseq.bs = DNAStringSet(rep(paste(rep("A", 100), collapse=""), 2000)) >> myseq.bs = DNAStringSet(rep(paste(rep("A", 100), collapse=""), 2000000)) >> myseq.bs = DNAStringSet(rep(paste(rep("A", 1200), collapse=""), 2000)) >> myseq.bs = DNAStringSet(rep(paste(rep("A", 1200), collapse=""), > 2000000)) > Error in .Call("new_SharedRaw_from_STRSXP", x, start(solved_SEW), > width(solved_SEW), : > negative length vectors are not allowed > > Arne > > > > > > > arne.mueller at novartis.com > Sent by: bioconductor-bounces at stat.math.ethz.ch > 10/07/2010 05:55 PM > > To > bioconductor at stat.math.ethz.ch > cc > > Subject > [BioC] Biostrings bug? > > > > > > > Dear All, > > I came across the following error in DNAStringSet from the Biostrings > package: > >> myseq = > "CTATGTGTGAGGGCAGCAACCAGAACTGTCTGCCCTGACTTCGCTCAGGATGCTGTGAACATGTGGC TCAGATGGTGCTAGGCATTTTCCTCTAGAGTCAGAAACGTGGACAGAGAGTCATCTCCTCTGGCTTCCCA GGCATGTCTGCCACTCTGAAGGTCTGAAGGTCTGGGTCTCCCTCCCATGGGATTTGAGTGCAGAGAGCTG TGTGACTGGGTCCCTTCAGATCCAGGTGGTGTCTGGACTGTAGCGTTGAGTGCCCTATCTTCCTGGTCTC AGAGCACCTATACAGTTTCCTCTTGGGCCAGGGATGTGGGCAGTGGTGGGCTGTACTGGAAGTCTCTCCT GTCCTGCAGTCTCAGGAGTGGCCACCTGTCTGGGTGGTGAGCTCTCTCTCCCATGGGGTTAGGGAGCAGG GAGGTTTTGCAAGATTCAGATTTAAGGTCACATTTTATCATCATAATGGAGGACATTAGGAAGGTCAGAA ATAACTCCCTTAAGGAAATACTTGACAACACAAGCAAACTAGTAGAAATCTTTTTAAAAGGAAACACAAA AGTATTTTAAAGAATTACAGCAAACCACAACCAAATAGGAGAGGAAATTGAACAAAATCATCCAGGAGTT AAATATGGAAATAGAAACAATGAAGAGAGCACACAGCGAGACAACCCTGGAGATAGAAAATCTAAGGAAG AGATCAGGAGTCATAGATGCAAGCATCACTGACGGACTACATGAGATAGAAGAGAGAATTTTGGGAGCAG AAGATATCATAGAAAACATTGACACAACCTTCAAAGAGAACGTAAATAGGAAAAAGCTCCTAGCCCTAAA CATGCAGGAAATCAGGAAACAAATCAAAGATCAAACCTAAATATATCAGGTATAGAAGAGAGTGAAGACT CCCAACATAAAGGGATGGTAAATATCTTCAACAATATAAACAATATAAAGGAAAACATCCCTAACCAAAA GAAATAAATG T! > > CCATAAATAGACATGAAGCCTGCAGAATTCCAAATAGAATGGACCAGAAAATAAATTCCTCCTGTCAC ATAATAGTCAAAACACCAAATGCACAAAACAAAGAATGAATATTAAAAGCATTAACGGATAAAGGTCAAG TACATTTAAAGGCAGACATGTCAGAATTACACCAGAATTCTTACCATGGACTATGAAAGCCAGAAGACAG ATGT" >> mysDNA = DNAStringSet(myseq) # ok! >> myseq = rep(myseq, 2000000) >> myseq.bs = DNAStringSet(myseq) > Error in .Call("new_SharedRaw_from_STRSXP", x, start(solved_SEW), > width(solved_SEW), : > > negative length vectors are not allowed > > Enter a frame number, or 0 to exit > 1: DNAStringSet(myseq) > 2: XStringSet("DNA", x, start = start, end = end, width = width, use.names > > = u > 3: XStringSet("DNA", x, start = start, end = end, width = width, use.names > > = u > 4: .charToXStringSet(basetype, x, start, end, width, use.names) > 5: .charToXString(basetype, x, solved_SEW) > > Selection: 0 >> > > Strangely the following works ...: > > myseq.bs = c(DNAStringSet(myseq[1:1000000]), > DNAStringSet(myseq[1000001:2000000])) > > Somehow there must be an overflow ... . > > Here's some more info on my system: > >> sessionInfo() > R version 2.11.1 Patched (2010-06-20 r52342) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] BSgenome.Rnorvegicus.UCSC.rn4_1.3.16 BSgenome_1.16.4 > [3] Biostrings_2.16.5 GenomicRanges_1.0.3 > [5] IRanges_1.6.11 > > loaded via a namespace (and not attached): > [1] Biobase_2.8.0 tools_2.11.1 > > Linux version 2.6.18-92.el5 (brewbuilder at ls20-bc2-13.build.redhat.com) > (gcc version 4.1.2 20071124 (Red Hat 4.1.2-41)) #1 SMP Tue Apr 29 13:16:15 > > EDT 2008 > > 64 Gb memory > > thanks for your help > +kind regards, > > Arne > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6