Biostrings readDNAMultipleAlignment broken for fasta input

0

Entering edit mode

Janet Young ▴ 740

@janet-young-2360

Last seen 4.5 years ago

Fred Hutchinson Cancer Research Center,…

Hi there, I found a broken function in Biostrings (I think) - readDNAMultipleAlignment doesn't work to read in fasta input files (my preferred sequence format for a lot of stuff outside of R). There's an easy workaround I can use, but thought maybe you'd want to know anyway. The code below should show you what I mean. Thanks! Janet ---------------------------- library(Biostrings) ## make a test fasta-format alignment file mySeqs <- DNAStringSet ( c("AGTGAGGTGATCGGTAGCTGATGCTAGTT", "AGTGA-GTGATCGGTAG-TGATGGTAGTT", "AGTGAGGTGATCGGTAGCTGATGCTAGTT", "---GAGGAGATCGGTAGCTGTTGCTAGTT") ) names(mySeqs) <- c("seq1","seq2","seq3","seq4") writeXStringSet( mySeqs, filepath="temp.fa") ### try reading it using readDNAMultipleAlignment myAln <- readDNAMultipleAlignment("temp.fa", format="fasta") # Error in XStringSet("DNA", x, start = start, end = end, width = width, : # error in evaluating the argument 'x' in selecting a method for function 'XStringSet': Error in isTRUEorFALSE(seek.first.rec) : # argument "seek.first.rec" is missing, with no default ### workaround: myAln2 <- readDNAStringSet("temp.fa", format="fasta") myAln2 <- DNAMultipleAlignment(myAln2) sessionInfo() R version 3.1.0 Patched (2014-05-26 r65771) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Biostrings_2.33.10 XVector_0.5.6 IRanges_1.99.16 [4] S4Vectors_0.0.9 BiocGenerics_0.11.2 loaded via a namespace (and not attached): [1] stats4_3.1.0 zlibbioc_1.11.1

Alignment Biostrings Alignment Biostrings • 1.6k views

ADD COMMENT • link updated 9.8 years ago by Hervé Pagès 16k • written 9.8 years ago by Janet Young ▴ 740

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 4 days ago

Seattle, WA, United States

Hi Janet, It's funny that I received a bug report for this same issue off list from someone else just a few minutes before your post. Sounds like you guys are collaborating on the same project and running into the same bugs ;-) This is fixed in Biostrings 2.32.1 (release) and 2.33.12 (devel). Both won't become available thru biocLite() before Saturday morning though, but you can get them now from svn. Cheers, H. On 07/03/2014 05:26 PM, Janet Young wrote: > Hi there, > > I found a broken function in Biostrings (I think) - readDNAMultipleAlignment doesn't work to read in fasta input files (my preferred sequence format for a lot of stuff outside of R). There's an easy workaround I can use, but thought maybe you'd want to know anyway. The code below should show you what I mean. > > Thanks! > > Janet > > ---------------------------- > > library(Biostrings) > > ## make a test fasta-format alignment file > mySeqs <- DNAStringSet ( c("AGTGAGGTGATCGGTAGCTGATGCTAGTT", > "AGTGA-GTGATCGGTAG-TGATGGTAGTT", > "AGTGAGGTGATCGGTAGCTGATGCTAGTT", > "---GAGGAGATCGGTAGCTGTTGCTAGTT") ) > names(mySeqs) <- c("seq1","seq2","seq3","seq4") > writeXStringSet( mySeqs, filepath="temp.fa") > > ### try reading it using readDNAMultipleAlignment > myAln <- readDNAMultipleAlignment("temp.fa", format="fasta") > # Error in XStringSet("DNA", x, start = start, end = end, width = width, : > # error in evaluating the argument 'x' in selecting a method for function 'XStringSet': Error in isTRUEorFALSE(seek.first.rec) : > # argument "seek.first.rec" is missing, with no default > > > ### workaround: > myAln2 <- readDNAStringSet("temp.fa", format="fasta") > myAln2 <- DNAMultipleAlignment(myAln2) > > sessionInfo() > > R version 3.1.0 Patched (2014-05-26 r65771) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Biostrings_2.33.10 XVector_0.5.6 IRanges_1.99.16 > [4] S4Vectors_0.0.9 BiocGenerics_0.11.2 > > loaded via a namespace (and not attached): > [1] stats4_3.1.0 zlibbioc_1.11.1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD COMMENT • link 9.8 years ago Hervé Pagès 16k

Login before adding your answer.