Biostring: print sequence alignment to file
3
0
Entering edit mode
@martin-preusse-5224
Last seen 7.1 years ago
I do a simple pairwise DNA sequence alignment with pairwiseAlignment from the Biostrings package in Bioconductor: library('Biostrings') seq1 = 'ATGCTA' seq2 = 'ATGTA' pairwiseAlignment(pattern = seq1, subject = seq2) The output looks as follows: Global PairwiseAlignedFixedSubject (1 of 1) pattern: [1] ATGCTA subject: [1] ATG-TA score: -4.091219 For very long sequences, the output is truncated and only one line is shown: Global PairwiseAlignedFixedSubject (1 of 1) pattern: [1] AT-----------------------------...---------------- TGTCTTCCAKATCTGGCGCGCCTGGGTTGATATC subject: [1] ATTGGCGGCCGCGCCACCATGCCAGAGCCAG...GAAGGCTGTATGCTGTTGTCTTC AAGATCTGGTACCGCTGGGTTGATATC score: -29418.8 How can I output the complete alignment to a text file? Cheers, Martin
Alignment Biostrings Alignment Biostrings • 3.8k views
0
Entering edit mode
Chu, Charles ▴ 20
@chu-charles-5227
Last seen 7.1 years ago
Martin: New to R. This is one way we have been doing it: One could save the alignment and convert to a XStringSet. So taking the example from the help(pairwiseAlignment): s1 <- DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG") s2 <- DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAAT TTTCATC") mat <- nucleotideSubstitutionMatrix(match = 1, mismatch = -3, baseOnly = TRUE) globalAlign <- pairwiseAlignment(s1, s2, substitutionMatrix = mat, gapOpening = -5, gapExtension = -2) Next, collect the DNAStrings in the pairwise alignment: r = BStringSet( c( toString( subject(globalAlign) ), toString( pattern(globalAlign)))) Then, write the alignment to a text file: write.XStringSet(r,"out1.txt") Alternatively, if the sequence is not too long, one can increase the size of the window, and it may show all of the sequence that you need. A colleague wrote a script to do this on a large file of sequences where we need to do pairwise alignment on a short stretch of DNA sequence repetitively. So he really deserves the credit for this. (Note: the alignment score is not collected). It is a fasta file, that can also be read by fasta readers. Hope this helps, Cheers, Charles -- Charles C. Chu, PhD Associate Investigator The Feinstein Institute for Medical Research North Shore-Long Island Jewish Health System 350 Community Drive Manhasset, NY 11030 Telephone: (516) 562-1207 FAX: (516) 562-1322 Email: cchu@nshs.edu On 4/13/12 11:24 AM, "Martin Preusse" <martin.preusse@googlemail.com> wrote: I do a simple pairwise DNA sequence alignment with pairwiseAlignment from the Biostrings package in Bioconductor: library('Biostrings') seq1 = 'ATGCTA' seq2 = 'ATGTA' pairwiseAlignment(pattern = seq1, subject = seq2) The output looks as follows: Global PairwiseAlignedFixedSubject (1 of 1) pattern: [1] ATGCTA subject: [1] ATG-TA score: -4.091219 For very long sequences, the output is truncated and only one line is shown: Global PairwiseAlignedFixedSubject (1 of 1) pattern: [1] AT-----------------------------...---------------- TGTCTTCCAKATCTGGCGCGCCTGGGTTGATATC subject: [1] ATTGGCGGCCGCGCCACCATGCCAGAGCCAG...GAAGGCTGTATGCTGTTGTCTTC AAGATCTGGTACCGCTGGGTTGATATC score: -29418.8 How can I output the complete alignment to a text file? Cheers, Martin _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor The information contained in this electronic e-mail transmission and any attachments are intended only for the use of the individual or entity to whom or to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this communication is not the intended recipient, or the employee or agent responsible for delivering this communication to the intended recipient, you are hereby notified that any dissemination, distribution, copying or disclosure of this communication and any attachment is strictly prohibited. If you have received this transmission in error, please notify the sender immediately by telephone and electronic mail, and delete the original communication and any attachment from any computer, server or other electronic recording or storage device or medium. Receipt by anyone other than the intended recipient is not a waiver of any attorney-client, physician-patient or other privilege. [[alternative HTML version deleted]]
0
Entering edit mode
Hi Charles, thanks! Your solution allows to print the two alignment strings separately. I was thinking of an output as generated by alignment tools: AGT-TCTAT | | | | | | | | | AGTATCTAT For this I would have to write a function to output the strings in blocks of e.g. 60 nucleotides, right? Cheers Martin Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > write.XStringSet
0
Entering edit mode
Martin: OK. I'm not as familiar with the alignment tools output in R. Your idea sounds good to me. I would be interested in what you come up with. Currently, we read the text file of paired alignments as a fasta file in Jalview or MacVector. In principle we could use these programs to generate a pretty output for print, but we have not done this. We have a large set of paired alignments (>10,000), so it makes sense to do the alignments in R first. That's it for now, Best, Charles On 4/16/12 7:06 AM, "Martin Preusse" <martin.preusse@googlemail.com> wrote: Hi Charles, thanks! Your solution allows to print the two alignment strings separately. I was thinking of an output as generated by alignment tools: AGT-TCTAT | | | | | | | | | AGTATCTAT For this I would have to write a function to output the strings in blocks of e.g. 60 nucleotides, right? Cheers Martin Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > write.XStringSet The information contained in this electronic e-mail transmission and any attachments are intended only for the use of the individual or entity to whom or to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this communication is not the intended recipient, or the employee or agent responsible for delivering this communication to the intended recipient, you are hereby notified that any dissemination, distribution, copying or disclosure of this communication and any attachment is strictly prohibited. If you have received this transmission in error, please notify the sender immediately by telephone and electronic mail, and delete the original communication and any attachment from any computer, server or other electronic recording or storage device or medium. Receipt by anyone other than the intended recipient is not a waiver of any attorney-client, physician-patient or other privilege. [[alternative HTML version deleted]]
0
Entering edit mode
Hi Martin, On 04/16/2012 04:06 AM, Martin Preusse wrote: > Hi Charles, > > thanks! Your solution allows to print the two alignment strings separately. > > I was thinking of an output as generated by alignment tools: > > AGT-TCTAT > | | | | | | | | | > AGTATCTAT This looks like BLAST output. Is this what you have in mind? Note that there are many alignment tools and many ways to output the result to a file. I'm not really familiar with the BLAST output format. Is it specified somewhere? Would that make sense to add something like a write.PairwiseAlignedXStringSet() function to Biostrings for writing the result of pairwiseAlignment() to a file? We could do this and support the BLAST format if that's a commonly used format. Thanks, H. > > For this I would have to write a function to output the strings in blocks of e.g. 60 nucleotides, right? > > Cheers > Martin > > > > Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > >> write.XStringSet > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
0
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 8 months ago
United States
What about providing an option in pairwiseAlignment to output to the MultipleAlignment class in Biostrings and then write the latter to different alignment formats? Thomas On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: > Hi Martin, > > On 04/16/2012 04:06 AM, Martin Preusse wrote: > > Hi Charles, > > > > thanks! Your solution allows to print the two alignment strings separately. > > > > I was thinking of an output as generated by alignment tools: > > > > AGT-TCTAT > > | | | | | | | | | > > AGTATCTAT > > This looks like BLAST output. Is this what you have in mind? Note that > there are many alignment tools and many ways to output the result to a > file. I'm not really familiar with the BLAST output format. Is it > specified somewhere? Would that make sense to add something like a > write.PairwiseAlignedXStringSet() function to Biostrings for writing > the result of pairwiseAlignment() to a file? We could do this and > support the BLAST format if that's a commonly used format. > > Thanks, > H. > > > > > For this I would have to write a function to output the strings in blocks of e.g. 60 nucleotides, right? > > > > Cheers > > Martin > > > > > > > > Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > > > >> write.XStringSet > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
0
Entering edit mode
Hi Thomas, On 04/17/2012 11:49 AM, Thomas Girke wrote: > What about providing an option in pairwiseAlignment to output to the > MultipleAlignment class in Biostrings and then write the latter to > different alignment formats? Or we could provide coercion methods to switch between PairwiseAlignedXStringSet and MultipleAlignment. Anyway that kind of moves Martin's problem from having a write.PairwiseAlignedXStringSet() function that produces BLAST output to having a write.MultipleAlignment() function that produces BLAST output. For the specific case of BLAST output, would it make sense to support it for MultipleAlignment? Can someone point me to an example of such output? Or even better, to the specs of such format? Note that right now there is the write.phylip() function in Biostrings for writing a MultipleAlignment object to a file but the Phylip format looks very different from the BLAST output: hpages at latitude:~head -n 20 phylip_test.txt 9 2343 Mask 0000000000 0000000000 0000000000 0000000000 0000000000 Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG Chimp ---------- ---------- ---------- ---------- ---------- Cow ---------- ---------- ---------- ---------- ---------- Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG Rat ---------- ---------- ---------- ---------- ---------- Dog ---------- ---------- ---------- ---------- ---------- Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG 0000000000 0000000000 0000000000 0001111111 1111111111 CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT ---------- ---------- ---------- ---------- ---A-TGGCT ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT ---------- ---------- ---------- ---------- ---A-TGGCT ---------- ---------- ---------- ---------- ---A-TGGCT AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT Thanks! H. > > Thomas > > On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: >> Hi Martin, >> >> On 04/16/2012 04:06 AM, Martin Preusse wrote: >>> Hi Charles, >>> >>> thanks! Your solution allows to print the two alignment strings separately. >>> >>> I was thinking of an output as generated by alignment tools: >>> >>> AGT-TCTAT >>> | | | | | | | | | >>> AGTATCTAT >> >> This looks like BLAST output. Is this what you have in mind? Note that >> there are many alignment tools and many ways to output the result to a >> file. I'm not really familiar with the BLAST output format. Is it >> specified somewhere? Would that make sense to add something like a >> write.PairwiseAlignedXStringSet() function to Biostrings for writing >> the result of pairwiseAlignment() to a file? We could do this and >> support the BLAST format if that's a commonly used format. >> >> Thanks, >> H. >> >>> >>> For this I would have to write a function to output the strings in blocks of e.g. 60 nucleotides, right? >>> >>> Cheers >>> Martin >>> >>> >>> >>> Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: >>> >>>> write.XStringSet >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319 ADD REPLY 0 Entering edit mode Thomas Girke ★ 1.7k @thomas-girke-993 Last seen 8 months ago United States Hi Herv?, To me, the most basic and versatile MSA or pairwise alignment format to output to would be FASTA since it is compatible with almost any other alignment editing software. For text-based viewing purposes my preference would be to also output to a format similar to the one shown in the following example. When there are only two sequences then one could show instead of a consensus line the pipe characters between the two sequences to indicate identical residues which mimics the blast output. A more standardized version of this pairwise alignment format can be found here: http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html library(Biostrings) p450 <- read.AAStringSet("http://faculty.ucr.edu/~tgirke/Documents/R_B ioCond/Samples/p450.mul", "fasta") StringSet2html <- function(msa=p450, file="p450.html", start=1, end=length(p450[[1]]), counter=20, browser=TRUE, ...) { if(class(msa)=="AAStringSet") msa <- AAStringSet(msa, start=start, end=end) if(class(msa)=="DNAStringSet") msa <- DNAStringSet(msa, start=start, end=end) msavec <- sapply(msa, toString) offset <- (counter-1)-nchar(nchar(msavec[1])) legend <- paste(paste(paste(paste(rep(" ", offset), collapse=""), format(seq(0, nchar(msavec[1]), by=counter)[-1])), collapse=""), collapse="") consensus <- consensusString(msavec, ambiguityMap=".", ...) msavec <- paste(msavec, rowSums(as.matrix(msa) != "-"), sep=" ") msavec <- paste(format(c("", names(msa), "Consensus"), justify="left"), c(legend, msavec, consensus), sep=" ") msavec <- c("<html> ", msavec, " </html>") writeLines(msavec, file) if(browser==TRUE) { browseURL(file) } } StringSet2html(msa=p450, file="p450.html", start=1, end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) StringSet2html(msa=p450, file="p450.html", start=450, end=470, counter=20, browser=T, threshold=1.0) Thomas On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: > Hi Thomas, > > On 04/17/2012 11:49 AM, Thomas Girke wrote: > > What about providing an option in pairwiseAlignment to output to the > > MultipleAlignment class in Biostrings and then write the latter to > > different alignment formats? > > Or we could provide coercion methods to switch between > PairwiseAlignedXStringSet and MultipleAlignment. > > Anyway that kind of moves Martin's problem from having a > write.PairwiseAlignedXStringSet() function that produces BLAST output > to having a write.MultipleAlignment() function that produces BLAST > output. For the specific case of BLAST output, would it make sense > to support it for MultipleAlignment? Can someone point me to an example > of such output? Or even better, to the specs of such format? > > Note that right now there is the write.phylip() function in Biostrings > for writing a MultipleAlignment object to a file but the Phylip format > looks very different from the BLAST output: > > hpages at latitude:~ head -n 20 phylip_test.txt > 9 2343 > Mask 0000000000 0000000000 0000000000 0000000000 0000000000 > Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG > Chimp ---------- ---------- ---------- ---------- ---------- > Cow ---------- ---------- ---------- ---------- ---------- > Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG > Rat ---------- ---------- ---------- ---------- ---------- > Dog ---------- ---------- ---------- ---------- ---------- > Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC > Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG > > 0000000000 0000000000 0000000000 0001111111 1111111111 > CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT > ---------- ---------- ---------- ---------- ---A-TGGCT > ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT > CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT > ---------- ---------- ---------- ---------- ---A-TGGCT > ---------- ---------- ---------- ---------- ---A-TGGCT > AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC > CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT > > Thanks! > H. > > > > > Thomas > > > > On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: > >> Hi Martin, > >> > >> On 04/16/2012 04:06 AM, Martin Preusse wrote: > >>> Hi Charles, > >>> > >>> thanks! Your solution allows to print the two alignment strings separately. > >>> > >>> I was thinking of an output as generated by alignment tools: > >>> > >>> AGT-TCTAT > >>> | | | | | | | | | > >>> AGTATCTAT > >> > >> This looks like BLAST output. Is this what you have in mind? Note that > >> there are many alignment tools and many ways to output the result to a > >> file. I'm not really familiar with the BLAST output format. Is it > >> specified somewhere? Would that make sense to add something like a > >> write.PairwiseAlignedXStringSet() function to Biostrings for writing > >> the result of pairwiseAlignment() to a file? We could do this and > >> support the BLAST format if that's a commonly used format. > >> > >> Thanks, > >> H. > >> > >>> > >>> For this I would have to write a function to output the strings in blocks of e.g. 60 nucleotides, right? > >>> > >>> Cheers > >>> Martin > >>> > >>> > >>> > >>> Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > >>> > >>>> write.XStringSet > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> > >> -- > >> Hervé Pagès > >> > >> Program in Computational Biology > >> Division of Public Health Sciences > >> Fred Hutchinson Cancer Research Center > >> 1100 Fairview Ave. N, M1-B514 > >> P.O. Box 19024 > >> Seattle, WA 98109-1024 > >> > >> E-mail: hpages at fhcrc.org > >> Phone: (206) 667-5791 > >> Fax: (206) 667-1319 > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319
0
Entering edit mode
Hi everybody, I think the output format depends on the purpose of the alignment. A pairwise sequence alignment is usually done to compare two sequences base by base. In my case, I compare sequencing results of cloned expression constructs with the desired sequence. Thus, the best output format would be "BLAST like". seq1: 1 ATCTGC 7 | | | . . | seq2: 1 ATCAAC 7 When doing MSA, most people might rather be interested in the consensus sequence. E.g. in the context of conservation between species. So write.PairwiseAlignedXStringSet() and write.MultipleAlignment() are quite different and BLAST doesn't make much sense for multiple alignments. This means it would be best to put the output in the PairwiseAlignment/MultipleAlignment and not to the XStringSet, right? This is an overview of sequence alignment formats used by EMBOSS: http://emboss.sourceforge.net/docs/themes/AlignFormats.html 'pair' or 'markx0' would be perfectly fine. Cheers Martin Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke: > Hi Herv?, > > To me, the most basic and versatile MSA or pairwise alignment format to output > to would be FASTA since it is compatible with almost any other alignment > editing software. For text-based viewing purposes my preference would be > to also output to a format similar to the one shown in the following > example. When there are only two sequences then one could show instead > of a consensus line the pipe characters between the two sequences to > indicate identical residues which mimics the blast output. A more > standardized version of this pairwise alignment format can be found > here: > http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html > > library(Biostrings) > p450 <- read.AAStringSet("http://faculty.ucr.edu/~tgirke/Documents/R _BioCond/Samples/p450.mul", "fasta") > > StringSet2html <- function(msa=p450, file="p450.html", start=1, end=length(p450[[1]]), counter=20, browser=TRUE, ...) { > if(class(msa)=="AAStringSet") msa <- AAStringSet(msa, start=start, end=end) > if(class(msa)=="DNAStringSet") msa <- DNAStringSet(msa, start=start, end=end) > msavec <- sapply(msa, toString) > offset <- (counter-1)-nchar(nchar(msavec[1])) > legend <- paste(paste(paste(paste(rep(" ", offset), collapse=""), format(seq(0, > nchar(msavec[1]), by=counter)[-1])), collapse=""), collapse="") > consensus <- consensusString(msavec, ambiguityMap=".", ...) > msavec <- paste(msavec, rowSums(as.matrix(msa) != "-"), sep=" ") > msavec <- paste(format(c("", names(msa), "Consensus"), justify="left"), c(legend, msavec, > consensus), sep=" ") > msavec <- c("<html>
", msavec, "
</html>") > writeLines(msavec, file) > if(browser==TRUE) { browseURL(file) } > } > StringSet2html(msa=p450, file="p450.html", start=1, end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) > StringSet2html(msa=p450, file="p450.html", start=450, end=470, counter=20, browser=T, threshold=1.0) > > > Thomas > > On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: > > Hi Thomas, > > > > On 04/17/2012 11:49 AM, Thomas Girke wrote: > > > What about providing an option in pairwiseAlignment to output to the > > > MultipleAlignment class in Biostrings and then write the latter to > > > different alignment formats? > > > > > > > > > > Or we could provide coercion methods to switch between > > PairwiseAlignedXStringSet and MultipleAlignment. > > > > Anyway that kind of moves Martin's problem from having a > > write.PairwiseAlignedXStringSet() function that produces BLAST output > > to having a write.MultipleAlignment() function that produces BLAST > > output. For the specific case of BLAST output, would it make sense > > to support it for MultipleAlignment? Can someone point me to an example > > of such output? Or even better, to the specs of such format? > > > > Note that right now there is the write.phylip() function in Biostrings > > for writing a MultipleAlignment object to a file but the Phylip format > > looks very different from the BLAST output: > > > > hpages at latitude:~head -n 20 phylip_test.txt > > 9 2343 > > Mask 0000000000 0000000000 0000000000 0000000000 0000000000 > > Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG > > Chimp ---------- ---------- ---------- ---------- ---------- > > Cow ---------- ---------- ---------- ---------- ---------- > > Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG > > Rat ---------- ---------- ---------- ---------- ---------- > > Dog ---------- ---------- ---------- ---------- ---------- > > Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC > > Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG > > > > 0000000000 0000000000 0000000000 0001111111 1111111111 > > CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT > > ---------- ---------- ---------- ---------- ---A-TGGCT > > ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT > > CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT > > ---------- ---------- ---------- ---------- ---A-TGGCT > > ---------- ---------- ---------- ---------- ---A-TGGCT > > AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC > > CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT > > > > Thanks! > > H. > > > > > > > > Thomas > > > > > > On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: > > > > Hi Martin, > > > > > > > > On 04/16/2012 04:06 AM, Martin Preusse wrote: > > > > > Hi Charles, > > > > > > > > > > thanks! Your solution allows to print the two alignment strings separately. > > > > > > > > > > I was thinking of an output as generated by alignment tools: > > > > > > > > > > AGT-TCTAT > > > > > | | | | | | | | | > > > > > AGTATCTAT > > > > > > > > > > > > > > > > > > > > This looks like BLAST output. Is this what you have in mind? Note that > > > > there are many alignment tools and many ways to output the result to a > > > > file. I'm not really familiar with the BLAST output format. Is it > > > > specified somewhere? Would that make sense to add something like a > > > > write.PairwiseAlignedXStringSet() function to Biostrings for writing > > > > the result of pairwiseAlignment() to a file? We could do this and > > > > support the BLAST format if that's a commonly used format. > > > > > > > > Thanks, > > > > H. > > > > > > > > > > > > > > For this I would have to write a function to output the strings in blocks of e.g. 60 nucleotides, right? > > > > > > > > > > Cheers > > > > > Martin > > > > > > > > > > > > > > > > > > > > Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > > > > > > > > > > > write.XStringSet > > > > > > > > > > _______________________________________________ > > > > > Bioconductor mailing list > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Hervé Pagès > > > > > > > > Program in Computational Biology > > > > Division of Public Health Sciences > > > > Fred Hutchinson Cancer Research Center > > > > 1100 Fairview Ave. N, M1-B514 > > > > P.O. Box 19024 > > > > Seattle, WA 98109-1024 > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > Phone: (206) 667-5791 > > > > Fax: (206) 667-1319 > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > -- > > Hervé Pagès > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > Phone: (206) 667-5791 > > Fax: (206) 667-1319 > ADD REPLY 0 Entering edit mode Hi, I just found this function to print a pairwise alignments in blocks. Doesn't add the match/mismatch indicators between sequences, but might be a starting point: http://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html#viewing-a -long-pairwise-alignment Cheers Martin Am Mittwoch, 18. April 2012 um 12:16 schrieb Martin Preusse: > Hi everybody, > > I think the output format depends on the purpose of the alignment. > > A pairwise sequence alignment is usually done to compare two sequences base by base. In my case, I compare sequencing results of cloned expression constructs with the desired sequence. Thus, the best output format would be "BLAST like". > > seq1: 1 ATCTGC 7 > | | | . . | > seq2: 1 ATCAAC 7 > > When doing MSA, most people might rather be interested in the consensus sequence. E.g. in the context of conservation between species. > > So write.PairwiseAlignedXStringSet() and write.MultipleAlignment() are quite different and BLAST doesn't make much sense for multiple alignments. This means it would be best to put the output in the PairwiseAlignment/MultipleAlignment and not to the XStringSet, right? > > This is an overview of sequence alignment formats used by EMBOSS: > http://emboss.sourceforge.net/docs/themes/AlignFormats.html > > 'pair' or 'markx0' would be perfectly fine. > > > Cheers > Martin > > > > Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke: > > > Hi Herv?, > > > > To me, the most basic and versatile MSA or pairwise alignment format to output > > to would be FASTA since it is compatible with almost any other alignment > > editing software. For text-based viewing purposes my preference would be > > to also output to a format similar to the one shown in the following > > example. When there are only two sequences then one could show instead > > of a consensus line the pipe characters between the two sequences to > > indicate identical residues which mimics the blast output. A more > > standardized version of this pairwise alignment format can be found > > here: > > http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html > > > > library(Biostrings) > > p450 <- read.AAStringSet("http://faculty.ucr.edu/~tgirke/Documents /R_BioCond/Samples/p450.mul", "fasta") > > > > StringSet2html <- function(msa=p450, file="p450.html", start=1, end=length(p450[[1]]), counter=20, browser=TRUE, ...) { > > if(class(msa)=="AAStringSet") msa <- AAStringSet(msa, start=start, end=end) > > if(class(msa)=="DNAStringSet") msa <- DNAStringSet(msa, start=start, end=end) > > msavec <- sapply(msa, toString) > > offset <- (counter-1)-nchar(nchar(msavec[1])) > > legend <- paste(paste(paste(paste(rep(" ", offset), collapse=""), format(seq(0, > > nchar(msavec[1]), by=counter)[-1])), collapse=""), collapse="") > > consensus <- consensusString(msavec, ambiguityMap=".", ...) > > msavec <- paste(msavec, rowSums(as.matrix(msa) != "-"), sep=" ") > > msavec <- paste(format(c("", names(msa), "Consensus"), justify="left"), c(legend, msavec, > > consensus), sep=" ") > > msavec <- c("<html> ", msavec, " </html>") > > writeLines(msavec, file) > > if(browser==TRUE) { browseURL(file) } > > } > > StringSet2html(msa=p450, file="p450.html", start=1, end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) > > StringSet2html(msa=p450, file="p450.html", start=450, end=470, counter=20, browser=T, threshold=1.0) > > > > > > Thomas > > > > On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: > > > Hi Thomas, > > > > > > On 04/17/2012 11:49 AM, Thomas Girke wrote: > > > > What about providing an option in pairwiseAlignment to output to the > > > > MultipleAlignment class in Biostrings and then write the latter to > > > > different alignment formats? > > > > > > > > > > > > > > > > > > > > > Or we could provide coercion methods to switch between > > > PairwiseAlignedXStringSet and MultipleAlignment. > > > > > > Anyway that kind of moves Martin's problem from having a > > > write.PairwiseAlignedXStringSet() function that produces BLAST output > > > to having a write.MultipleAlignment() function that produces BLAST > > > output. For the specific case of BLAST output, would it make sense > > > to support it for MultipleAlignment? Can someone point me to an example > > > of such output? Or even better, to the specs of such format? > > > > > > Note that right now there is the write.phylip() function in Biostrings > > > for writing a MultipleAlignment object to a file but the Phylip format > > > looks very different from the BLAST output: > > > > > > hpages at latitude:~ head -n 20 phylip_test.txt > > > 9 2343 > > > Mask 0000000000 0000000000 0000000000 0000000000 0000000000 > > > Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG > > > Chimp ---------- ---------- ---------- ---------- ---------- > > > Cow ---------- ---------- ---------- ---------- ---------- > > > Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG > > > Rat ---------- ---------- ---------- ---------- ---------- > > > Dog ---------- ---------- ---------- ---------- ---------- > > > Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC > > > Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG > > > > > > 0000000000 0000000000 0000000000 0001111111 1111111111 > > > CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT > > > CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC > > > CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT > > > > > > Thanks! > > > H. > > > > > > > > > > > Thomas > > > > > > > > On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: > > > > > Hi Martin, > > > > > > > > > > On 04/16/2012 04:06 AM, Martin Preusse wrote: > > > > > > Hi Charles, > > > > > > > > > > > > thanks! Your solution allows to print the two alignment strings separately. > > > > > > > > > > > > I was thinking of an output as generated by alignment tools: > > > > > > > > > > > > AGT-TCTAT > > > > > > | | | | | | | | | > > > > > > AGTATCTAT > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This looks like BLAST output. Is this what you have in mind? Note that > > > > > there are many alignment tools and many ways to output the result to a > > > > > file. I'm not really familiar with the BLAST output format. Is it > > > > > specified somewhere? Would that make sense to add something like a > > > > > write.PairwiseAlignedXStringSet() function to Biostrings for writing > > > > > the result of pairwiseAlignment() to a file? We could do this and > > > > > support the BLAST format if that's a commonly used format. > > > > > > > > > > Thanks, > > > > > H. > > > > > > > > > > > > > > > > > For this I would have to write a function to output the strings in blocks of e.g. 60 nucleotides, right? > > > > > > > > > > > > Cheers > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > > > > > > > > > > > > > write.XStringSet > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Bioconductor mailing list > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Hervé Pagès > > > > > > > > > > Program in Computational Biology > > > > > Division of Public Health Sciences > > > > > Fred Hutchinson Cancer Research Center > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > P.O. Box 19024 > > > > > Seattle, WA 98109-1024 > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > Phone: (206) 667-5791 > > > > > Fax: (206) 667-1319 > > > > > > > > > > _______________________________________________ > > > > > Bioconductor mailing list > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Hervé Pagès > > > > > > Program in Computational Biology > > > Division of Public Health Sciences > > > Fred Hutchinson Cancer Research Center > > > 1100 Fairview Ave. N, M1-B514 > > > P.O. Box 19024 > > > Seattle, WA 98109-1024 > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > Phone: (206) 667-5791 > > > Fax: (206) 667-1319 > > >
0
Entering edit mode
Thanks Martin and Thomas for the useful feedback. The 'pair' and 'markx0' formats supported by Emboss seem indeed appropriate for printing the output of pairwiseAlignment() to a file. I'll add support for those 2 formats in Biostrings. Won't be before 1 week or 2 though... Cheers, H. On 04/18/2012 03:20 AM, Martin Preusse wrote: > Hi, > > I just found this function to print a pairwise alignments in blocks. Doesn't add the match/mismatch indicators between sequences, but might be a starting point: > > http://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html#viewing-a -long-pairwise-alignment > > > Cheers > Martin > > > > Am Mittwoch, 18. April 2012 um 12:16 schrieb Martin Preusse: > >> Hi everybody, >> >> I think the output format depends on the purpose of the alignment. >> >> A pairwise sequence alignment is usually done to compare two sequences base by base. In my case, I compare sequencing results of cloned expression constructs with the desired sequence. Thus, the best output format would be "BLAST like". >> >> seq1: 1 ATCTGC 7 >> | | | . . | >> seq2: 1 ATCAAC 7 >> >> When doing MSA, most people might rather be interested in the consensus sequence. E.g. in the context of conservation between species. >> >> So write.PairwiseAlignedXStringSet() and write.MultipleAlignment() are quite different and BLAST doesn't make much sense for multiple alignments. This means it would be best to put the output in the PairwiseAlignment/MultipleAlignment and not to the XStringSet, right? >> >> This is an overview of sequence alignment formats used by EMBOSS: >> http://emboss.sourceforge.net/docs/themes/AlignFormats.html >> >> 'pair' or 'markx0' would be perfectly fine. >> >> >> Cheers >> Martin >> >> >> >> Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke: >> >>> Hi Herv?, >>> >>> To me, the most basic and versatile MSA or pairwise alignment format to output >>> to would be FASTA since it is compatible with almost any other alignment >>> editing software. For text-based viewing purposes my preference would be >>> to also output to a format similar to the one shown in the following >>> example. When there are only two sequences then one could show instead >>> of a consensus line the pipe characters between the two sequences to >>> indicate identical residues which mimics the blast output. A more >>> standardized version of this pairwise alignment format can be found >>> here: >>> http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html >>> >>> library(Biostrings) >>> p450<- read.AAStringSet("http://faculty.ucr.edu/~tgirke/Documents/ R_BioCond/Samples/p450.mul", "fasta") >>> >>> StringSet2html<- function(msa=p450, file="p450.html", start=1, end=length(p450[[1]]), counter=20, browser=TRUE, ...) { >>> if(class(msa)=="AAStringSet") msa<- AAStringSet(msa, start=start, end=end) >>> if(class(msa)=="DNAStringSet") msa<- DNAStringSet(msa, start=start, end=end) >>> msavec<- sapply(msa, toString) >>> offset<- (counter-1)-nchar(nchar(msavec[1])) >>> legend<- paste(paste(paste(paste(rep(" ", offset), collapse=""), format(seq(0, >>> nchar(msavec[1]), by=counter)[-1])), collapse=""), collapse="") >>> consensus<- consensusString(msavec, ambiguityMap=".", ...) >>> msavec<- paste(msavec, rowSums(as.matrix(msa) != "-"), sep=" ") >>> msavec<- paste(format(c("", names(msa), "Consensus"), justify="left"), c(legend, msavec, >>> consensus), sep=" ") >>> msavec<- c("<html>
", msavec,"
</html>") >>> writeLines(msavec, file) >>> if(browser==TRUE) { browseURL(file) } >>> } >>> StringSet2html(msa=p450, file="p450.html", start=1, end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) >>> StringSet2html(msa=p450, file="p450.html", start=450, end=470, counter=20, browser=T, threshold=1.0) >>> >>> >>> Thomas >>> >>> On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: >>>> Hi Thomas, >>>> >>>> On 04/17/2012 11:49 AM, Thomas Girke wrote: >>>>> What about providing an option in pairwiseAlignment to output to the >>>>> MultipleAlignment class in Biostrings and then write the latter to >>>>> different alignment formats? >>>> >>>> >>>> >>>> >>>> >>>> >>>> Or we could provide coercion methods to switch between >>>> PairwiseAlignedXStringSet and MultipleAlignment. >>>> >>>> Anyway that kind of moves Martin's problem from having a >>>> write.PairwiseAlignedXStringSet() function that produces BLAST output >>>> to having a write.MultipleAlignment() function that produces BLAST >>>> output. For the specific case of BLAST output, would it make sense >>>> to support it for MultipleAlignment? Can someone point me to an example >>>> of such output? Or even better, to the specs of such format? >>>> >>>> Note that right now there is the write.phylip() function in Biostrings >>>> for writing a MultipleAlignment object to a file but the Phylip format >>>> looks very different from the BLAST output: >>>> >>>> hpages at latitude:~head -n 20 phylip_test.txt >>>> 9 2343 >>>> Mask 0000000000 0000000000 0000000000 0000000000 0000000000 >>>> Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG >>>> Chimp ---------- ---------- ---------- ---------- ---------- >>>> Cow ---------- ---------- ---------- ---------- ---------- >>>> Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG >>>> Rat ---------- ---------- ---------- ---------- ---------- >>>> Dog ---------- ---------- ---------- ---------- ---------- >>>> Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC >>>> Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG >>>> >>>> 0000000000 0000000000 0000000000 0001111111 1111111111 >>>> CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT >>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>> ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT >>>> CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT >>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>> AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC >>>> CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT >>>> >>>> Thanks! >>>> H. >>>> >>>>> >>>>> Thomas >>>>> >>>>> On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: >>>>>> Hi Martin, >>>>>> >>>>>> On 04/16/2012 04:06 AM, Martin Preusse wrote: >>>>>>> Hi Charles, >>>>>>> >>>>>>> thanks! Your solution allows to print the two alignment strings separately. >>>>>>> >>>>>>> I was thinking of an output as generated by alignment tools: >>>>>>> >>>>>>> AGT-TCTAT >>>>>>> | | | | | | | | | >>>>>>> AGTATCTAT >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> This looks like BLAST output. Is this what you have in mind? Note that >>>>>> there are many alignment tools and many ways to output the result to a >>>>>> file. I'm not really familiar with the BLAST output format. Is it >>>>>> specified somewhere? Would that make sense to add something like a >>>>>> write.PairwiseAlignedXStringSet() function to Biostrings for writing >>>>>> the result of pairwiseAlignment() to a file? We could do this and >>>>>> support the BLAST format if that's a commonly used format. >>>>>> >>>>>> Thanks, >>>>>> H. >>>>>> >>>>>>> >>>>>>> For this I would have to write a function to output the strings in blocks of e.g. 60 nucleotides, right? >>>>>>> >>>>>>> Cheers >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>> >>>>>>> Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: >>>>>>> >>>>>>>> write.XStringSet >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Hervé Pagès >>>>>> >>>>>> Program in Computational Biology >>>>>> Division of Public Health Sciences >>>>>> Fred Hutchinson Cancer Research Center >>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>> P.O. Box 19024 >>>>>> Seattle, WA 98109-1024 >>>>>> >>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>> Phone: (206) 667-5791 >>>>>> Fax: (206) 667-1319 >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Hervé Pagès >>>> >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M1-B514 >>>> P.O. Box 19024 >>>> Seattle, WA 98109-1024 >>>> >>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>> Phone: (206) 667-5791 >>>> Fax: (206) 667-1319 >>> >> > > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319 ADD REPLY 0 Entering edit mode Hi Herv?, thanks for your help! If you need suggestions, help or testing, just say the word. Will you implement the header also? If you do so, I would be thankful for an option like "header=F" for the output. Cheers Martin Am Samstag, 21. April 2012 um 02:12 schrieb Hervé Pagès: > Thanks Martin and Thomas for the useful feedback. The 'pair' and > 'markx0' formats supported by Emboss seem indeed appropriate for > printing the output of pairwiseAlignment() to a file. I'll add > support for those 2 formats in Biostrings. Won't be before 1 week > or 2 though... > > Cheers, > H. > > On 04/18/2012 03:20 AM, Martin Preusse wrote: > > Hi, > > > > I just found this function to print a pairwise alignments in blocks. Doesn't add the match/mismatch indicators between sequences, but might be a starting point: > > > > http://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html#viewing-a -long-pairwise-alignment > > > > > > Cheers > > Martin > > > > > > > > Am Mittwoch, 18. April 2012 um 12:16 schrieb Martin Preusse: > > > > > Hi everybody, > > > > > > I think the output format depends on the purpose of the alignment. > > > > > > A pairwise sequence alignment is usually done to compare two sequences base by base. In my case, I compare sequencing results of cloned expression constructs with the desired sequence. Thus, the best output format would be "BLAST like". > > > > > > seq1: 1 ATCTGC 7 > > > | | | . . | > > > seq2: 1 ATCAAC 7 > > > > > > When doing MSA, most people might rather be interested in the consensus sequence. E.g. in the context of conservation between species. > > > > > > So write.PairwiseAlignedXStringSet() and write.MultipleAlignment() are quite different and BLAST doesn't make much sense for multiple alignments. This means it would be best to put the output in the PairwiseAlignment/MultipleAlignment and not to the XStringSet, right? > > > > > > This is an overview of sequence alignment formats used by EMBOSS: > > > http://emboss.sourceforge.net/docs/themes/AlignFormats.html > > > > > > 'pair' or 'markx0' would be perfectly fine. > > > > > > > > > Cheers > > > Martin > > > > > > > > > > > > Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke: > > > > > > > Hi Herv?, > > > > > > > > To me, the most basic and versatile MSA or pairwise alignment format to output > > > > to would be FASTA since it is compatible with almost any other alignment > > > > editing software. For text-based viewing purposes my preference would be > > > > to also output to a format similar to the one shown in the following > > > > example. When there are only two sequences then one could show instead > > > > of a consensus line the pipe characters between the two sequences to > > > > indicate identical residues which mimics the blast output. A more > > > > standardized version of this pairwise alignment format can be found > > > > here: > > > > http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html > > > > > > > > library(Biostrings) > > > > p450<- read.AAStringSet("http://faculty.ucr.edu/~tgirke/Docume nts/R_BioCond/Samples/p450.mul", "fasta") > > > > > > > > StringSet2html<- function(msa=p450, file="p450.html", start=1, end=length(p450[[1]]), counter=20, browser=TRUE, ...) { > > > > if(class(msa)=="AAStringSet") msa<- AAStringSet(msa, start=start, end=end) > > > > if(class(msa)=="DNAStringSet") msa<- DNAStringSet(msa, start=start, end=end) > > > > msavec<- sapply(msa, toString) > > > > offset<- (counter-1)-nchar(nchar(msavec[1])) > > > > legend<- paste(paste(paste(paste(rep(" ", offset), collapse=""), format(seq(0, > > > > nchar(msavec[1]), by=counter)[-1])), collapse=""), collapse="") > > > > consensus<- consensusString(msavec, ambiguityMap=".", ...) > > > > msavec<- paste(msavec, rowSums(as.matrix(msa) != "-"), sep=" ") > > > > msavec<- paste(format(c("", names(msa), "Consensus"), justify="left"), c(legend, msavec, > > > > consensus), sep=" ") > > > > msavec<- c("<html> ", msavec," </html>") > > > > writeLines(msavec, file) > > > > if(browser==TRUE) { browseURL(file) } > > > > } > > > > StringSet2html(msa=p450, file="p450.html", start=1, end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) > > > > StringSet2html(msa=p450, file="p450.html", start=450, end=470, counter=20, browser=T, threshold=1.0) > > > > > > > > > > > > Thomas > > > > > > > > On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: > > > > > Hi Thomas, > > > > > > > > > > On 04/17/2012 11:49 AM, Thomas Girke wrote: > > > > > > What about providing an option in pairwiseAlignment to output to the > > > > > > MultipleAlignment class in Biostrings and then write the latter to > > > > > > different alignment formats? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or we could provide coercion methods to switch between > > > > > PairwiseAlignedXStringSet and MultipleAlignment. > > > > > > > > > > Anyway that kind of moves Martin's problem from having a > > > > > write.PairwiseAlignedXStringSet() function that produces BLAST output > > > > > to having a write.MultipleAlignment() function that produces BLAST > > > > > output. For the specific case of BLAST output, would it make sense > > > > > to support it for MultipleAlignment? Can someone point me to an example > > > > > of such output? Or even better, to the specs of such format? > > > > > > > > > > Note that right now there is the write.phylip() function in Biostrings > > > > > for writing a MultipleAlignment object to a file but the Phylip format > > > > > looks very different from the BLAST output: > > > > > > > > > > hpages at latitude:~ head -n 20 phylip_test.txt > > > > > 9 2343 > > > > > Mask 0000000000 0000000000 0000000000 0000000000 0000000000 > > > > > Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG > > > > > Chimp ---------- ---------- ---------- ---------- ---------- > > > > > Cow ---------- ---------- ---------- ---------- ---------- > > > > > Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG > > > > > Rat ---------- ---------- ---------- ---------- ---------- > > > > > Dog ---------- ---------- ---------- ---------- ---------- > > > > > Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC > > > > > Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG > > > > > > > > > > 0000000000 0000000000 0000000000 0001111111 1111111111 > > > > > CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT > > > > > CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC > > > > > CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT > > > > > > > > > > Thanks! > > > > > H. > > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: > > > > > > > Hi Martin, > > > > > > > > > > > > > > On 04/16/2012 04:06 AM, Martin Preusse wrote: > > > > > > > > Hi Charles, > > > > > > > > > > > > > > > > thanks! Your solution allows to print the two alignment strings separately. > > > > > > > > > > > > > > > > I was thinking of an output as generated by alignment tools: > > > > > > > > > > > > > > > > AGT-TCTAT > > > > > > > > | | | | | | | | | > > > > > > > > AGTATCTAT > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This looks like BLAST output. Is this what you have in mind? Note that > > > > > > > there are many alignment tools and many ways to output the result to a > > > > > > > file. I'm not really familiar with the BLAST output format. Is it > > > > > > > specified somewhere? Would that make sense to add something like a > > > > > > > write.PairwiseAlignedXStringSet() function to Biostrings for writing > > > > > > > the result of pairwiseAlignment() to a file? We could do this and > > > > > > > support the BLAST format if that's a commonly used format. > > > > > > > > > > > > > > Thanks, > > > > > > > H. > > > > > > > > > > > > > > > > > > > > > > > For this I would have to write a function to output the strings in blocks of e.g. 60 nucleotides, right? > > > > > > > > > > > > > > > > Cheers > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > > > > > > > > > > > > > > > > > write.XStringSet > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Bioconductor mailing list > > > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Hervé Pagès > > > > > > > > > > > > > > Program in Computational Biology > > > > > > > Division of Public Health Sciences > > > > > > > Fred Hutchinson Cancer Research Center > > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > > P.O. Box 19024 > > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > > Phone: (206) 667-5791 > > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioconductor mailing list > > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Hervé Pagès > > > > > > > > > > Program in Computational Biology > > > > > Division of Public Health Sciences > > > > > Fred Hutchinson Cancer Research Center > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > P.O. Box 19024 > > > > > Seattle, WA 98109-1024 > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > Phone: (206) 667-5791 > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > Phone: (206) 667-5791 > Fax: (206) 667-1319