Biostring: print sequence alignment to file
1
0
Entering edit mode
@herve-pages-1542
Last seen 15 hours ago
Seattle, WA, United States
Hi Martin, On 06/22/2012 12:09 PM, Hervé Pagès wrote: > Hi Martin, > > On 06/14/2012 06:55 AM, Martin Preusse wrote: >> Hi guys, >> >> anything new on the sequence output? Maybe I missed something :) >> please tell me if you need testing etc. > > Still on my list. Will work on this in the next couple of weeks. I'll > let you know. Thanks for the reminder. There is now a writePairwiseAlignments() function (in Biostrings 2.25.8) for doing this. It produces a file in the "pair" format (as described on the EMBOSS website, at the URL you sent earlier): > library(Biostrings) > pattern <- DNAString("CGTACGTAACGTTCGT") > subject <- DNAString("CGTCGTCGTCCGTAA") > x1 <- pairwiseAlignment(pattern, subject) > x1 Global PairwiseAlignmentsSingleSubject (1 of 1) pattern: [1] CGTACGTAACGTTCGT subject: [1] CGT-CGT--CGTCCGT score: -32.11822 > writePairwiseAlignments(x1, block.width=10) # 'block.width' default is 50 ######################################## # Program: Biostrings (version 2.25.8), a Bioconductor package # Rundate: Wed Jul 18 23:05:46 2012 ######################################## #======================================= # # Aligned_sequences: 2 # 1: P1 # 2: S1 # Matrix: NA # Gap_penalty: 14.0 # Extend_penalty: 4.0 # # Length: 18 # Identity: 12/18 (66.7%) # Similarity: NA/18 (NA%) # Gaps: 5/18 (27.8%) # Score: -32.11822 # # #======================================= P1 1 CGTACGTAAC 10 ||| ||| | S1 1 CGT-CGT--C 7 P1 11 GTTCGT-- 16 || ||| S1 8 GTCCGTAA 15 #--------------------------------------- #--------------------------------------- Only lightly tested. Not necessarily very performant (no C code). Please have a look at the man page for some caveats (especially if you plan to use it on NON global alignments). Feedback welcome. Thanks, H. > > H. > >> >> Cheers >> Martin >> >> >> Am Samstag, 21. April 2012 um 11:55 schrieb Martin Preusse: >> >>> Hi Herv?, >>> >>> thanks for your help! If you need suggestions, help or testing, just >>> say the word. >>> >>> Will you implement the header also? If you do so, I would be thankful >>> for an option like "header=F" for the output. >>> >>> >>> Cheers >>> Martin >>> >>> >>> Am Samstag, 21. April 2012 um 02:12 schrieb Hervé Pagès: >>> >>>> Thanks Martin and Thomas for the useful feedback. The 'pair' and >>>> 'markx0' formats supported by Emboss seem indeed appropriate for >>>> printing the output of pairwiseAlignment() to a file. I'll add >>>> support for those 2 formats in Biostrings. Won't be before 1 week >>>> or 2 though... >>>> >>>> Cheers, >>>> H. >>>> >>>> On 04/18/2012 03:20 AM, Martin Preusse wrote: >>>>> Hi, >>>>> >>>>> I just found this function to print a pairwise alignments in >>>>> blocks. Doesn't add the match/mismatch indicators between >>>>> sequences, but might be a starting point: >>>>> >>>>> http://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html#viewing-a -long-pairwise-alignment >>>>> >>>>> >>>>> >>>>> Cheers >>>>> Martin >>>>> >>>>> >>>>> >>>>> Am Mittwoch, 18. April 2012 um 12:16 schrieb Martin Preusse: >>>>> >>>>>> Hi everybody, >>>>>> >>>>>> I think the output format depends on the purpose of the alignment. >>>>>> >>>>>> A pairwise sequence alignment is usually done to compare two >>>>>> sequences base by base. In my case, I compare sequencing results >>>>>> of cloned expression constructs with the desired sequence. Thus, >>>>>> the best output format would be "BLAST like". >>>>>> >>>>>> seq1: 1 ATCTGC 7 >>>>>> | | | . . | >>>>>> seq2: 1 ATCAAC 7 >>>>>> >>>>>> When doing MSA, most people might rather be interested in the >>>>>> consensus sequence. E.g. in the context of conservation between >>>>>> species. >>>>>> >>>>>> So write.PairwiseAlignedXStringSet() and write.MultipleAlignment() >>>>>> are quite different and BLAST doesn't make much sense for multiple >>>>>> alignments. This means it would be best to put the output in the >>>>>> PairwiseAlignment/MultipleAlignment and not to the XStringSet, right? >>>>>> >>>>>> This is an overview of sequence alignment formats used by EMBOSS: >>>>>> http://emboss.sourceforge.net/docs/themes/AlignFormats.html >>>>>> >>>>>> 'pair' or 'markx0' would be perfectly fine. >>>>>> >>>>>> >>>>>> Cheers >>>>>> Martin >>>>>> >>>>>> >>>>>> >>>>>> Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke: >>>>>> >>>>>>> Hi Herv?, >>>>>>> >>>>>>> To me, the most basic and versatile MSA or pairwise alignment >>>>>>> format to output >>>>>>> to would be FASTA since it is compatible with almost any other >>>>>>> alignment >>>>>>> editing software. For text-based viewing purposes my preference >>>>>>> would be >>>>>>> to also output to a format similar to the one shown in the following >>>>>>> example. When there are only two sequences then one could show >>>>>>> instead >>>>>>> of a consensus line the pipe characters between the two sequences to >>>>>>> indicate identical residues which mimics the blast output. A more >>>>>>> standardized version of this pairwise alignment format can be found >>>>>>> here: >>>>>>> http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html >>>>>>> >>>>>>> library(Biostrings) >>>>>>> p450<- >>>>>>> read.AAStringSet("http://faculty.ucr.edu/~tgirke/Documents/R_B ioCond/Samples/p450.mul", >>>>>>> "fasta") >>>>>>> >>>>>>> StringSet2html<- function(msa=p450, file="p450.html", start=1, >>>>>>> end=length(p450[[1]]), counter=20, browser=TRUE, ...) { >>>>>>> if(class(msa)=="AAStringSet") msa<- AAStringSet(msa, start=start, >>>>>>> end=end) >>>>>>> if(class(msa)=="DNAStringSet") msa<- DNAStringSet(msa, >>>>>>> start=start, end=end) >>>>>>> msavec<- sapply(msa, toString) >>>>>>> offset<- (counter-1)-nchar(nchar(msavec[1])) >>>>>>> legend<- paste(paste(paste(paste(rep(" ", offset), collapse=""), >>>>>>> format(seq(0, >>>>>>> nchar(msavec[1]), by=counter)[-1])), collapse=""), collapse="") >>>>>>> consensus<- consensusString(msavec, ambiguityMap=".", ...) >>>>>>> msavec<- paste(msavec, rowSums(as.matrix(msa) != "-"), sep=" ") >>>>>>> msavec<- paste(format(c("", names(msa), "Consensus"), >>>>>>> justify="left"), c(legend, msavec, >>>>>>> consensus), sep=" ") >>>>>>> msavec<- c("<html>
", msavec,"
</html>") >>>>>>> writeLines(msavec, file) >>>>>>> if(browser==TRUE) { browseURL(file) } >>>>>>> } >>>>>>> StringSet2html(msa=p450, file="p450.html", start=1, >>>>>>> end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) >>>>>>> StringSet2html(msa=p450, file="p450.html", start=450, end=470, >>>>>>> counter=20, browser=T, threshold=1.0) >>>>>>> >>>>>>> >>>>>>> Thomas >>>>>>> >>>>>>> On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: >>>>>>>> Hi Thomas, >>>>>>>> >>>>>>>> On 04/17/2012 11:49 AM, Thomas Girke wrote: >>>>>>>>> What about providing an option in pairwiseAlignment to output >>>>>>>>> to the >>>>>>>>> MultipleAlignment class in Biostrings and then write the latter to >>>>>>>>> different alignment formats? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Or we could provide coercion methods to switch between >>>>>>>> PairwiseAlignedXStringSet and MultipleAlignment. >>>>>>>> >>>>>>>> Anyway that kind of moves Martin's problem from having a >>>>>>>> write.PairwiseAlignedXStringSet() function that produces BLAST >>>>>>>> output >>>>>>>> to having a write.MultipleAlignment() function that produces BLAST >>>>>>>> output. For the specific case of BLAST output, would it make sense >>>>>>>> to support it for MultipleAlignment? Can someone point me to an >>>>>>>> example >>>>>>>> of such output? Or even better, to the specs of such format? >>>>>>>> >>>>>>>> Note that right now there is the write.phylip() function in >>>>>>>> Biostrings >>>>>>>> for writing a MultipleAlignment object to a file but the Phylip >>>>>>>> format >>>>>>>> looks very different from the BLAST output: >>>>>>>> >>>>>>>> hpages at latitude:~head -n 20 phylip_test.txt >>>>>>>> 9 2343 >>>>>>>> Mask 0000000000 0000000000 0000000000 0000000000 0000000000 >>>>>>>> Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG >>>>>>>> Chimp ---------- ---------- ---------- ---------- ---------- >>>>>>>> Cow ---------- ---------- ---------- ---------- ---------- >>>>>>>> Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG >>>>>>>> Rat ---------- ---------- ---------- ---------- ---------- >>>>>>>> Dog ---------- ---------- ---------- ---------- ---------- >>>>>>>> Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC >>>>>>>> Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG >>>>>>>> >>>>>>>> 0000000000 0000000000 0000000000 0001111111 1111111111 >>>>>>>> CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT >>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>> ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT >>>>>>>> CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT >>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>> AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC >>>>>>>> CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT >>>>>>>> >>>>>>>> Thanks! >>>>>>>> H. >>>>>>>> >>>>>>>>> >>>>>>>>> Thomas >>>>>>>>> >>>>>>>>> On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: >>>>>>>>>> Hi Martin, >>>>>>>>>> >>>>>>>>>> On 04/16/2012 04:06 AM, Martin Preusse wrote: >>>>>>>>>>> Hi Charles, >>>>>>>>>>> >>>>>>>>>>> thanks! Your solution allows to print the two alignment >>>>>>>>>>> strings separately. >>>>>>>>>>> >>>>>>>>>>> I was thinking of an output as generated by alignment tools: >>>>>>>>>>> >>>>>>>>>>> AGT-TCTAT >>>>>>>>>>> | | | | | | | | | >>>>>>>>>>> AGTATCTAT >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This looks like BLAST output. Is this what you have in mind? >>>>>>>>>> Note that >>>>>>>>>> there are many alignment tools and many ways to output the >>>>>>>>>> result to a >>>>>>>>>> file. I'm not really familiar with the BLAST output format. Is it >>>>>>>>>> specified somewhere? Would that make sense to add something >>>>>>>>>> like a >>>>>>>>>> write.PairwiseAlignedXStringSet() function to Biostrings for >>>>>>>>>> writing >>>>>>>>>> the result of pairwiseAlignment() to a file? We could do this and >>>>>>>>>> support the BLAST format if that's a commonly used format. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> H. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> For this I would have to write a function to output the >>>>>>>>>>> strings in blocks of e.g. 60 nucleotides, right? >>>>>>>>>>> >>>>>>>>>>> Cheers >>>>>>>>>>> Martin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: >>>>>>>>>>> >>>>>>>>>>>> write.XStringSet >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>> Search the archives: >>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Hervé Pagès >>>>>>>>>> >>>>>>>>>> Program in Computational Biology >>>>>>>>>> Division of Public Health Sciences >>>>>>>>>> Fred Hutchinson Cancer Research Center >>>>>>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>>>>>> P.O. Box 19024 >>>>>>>>>> Seattle, WA 98109-1024 >>>>>>>>>> >>>>>>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>>>>>> Phone: (206) 667-5791 >>>>>>>>>> Fax: (206) 667-1319 >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioconductor mailing list >>>>>>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>> Search the archives: >>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Hervé Pagès >>>>>>>> >>>>>>>> Program in Computational Biology >>>>>>>> Division of Public Health Sciences >>>>>>>> Fred Hutchinson Cancer Research Center >>>>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>>>> P.O. Box 19024 >>>>>>>> Seattle, WA 98109-1024 >>>>>>>> >>>>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>>>> Phone: (206) 667-5791 >>>>>>>> Fax: (206) 667-1319 >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Hervé Pagès >>>> >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M1-B514 >>>> P.O. Box 19024 >>>> Seattle, WA 98109-1024 >>>> >>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>> Phone: (206) 667-5791 >>>> Fax: (206) 667-1319 >>> >> >> >> > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319 ADD COMMENT 0 Entering edit mode @martin-preusse-5224 Last seen 7.1 years ago Hi Herv?, thanks! This looks great. I have a question though: How can I install the latest version of Biostrings? When I install with: source("http://bioconductor.org/biocLite.R") biocLite("Biostrings") I get Version 2.22.0 ? ;) Cheers Martin Am Donnerstag, 19. Juli 2012 um 08:10 schrieb Hervé Pagès: > Hi Martin, > > On 06/22/2012 12:09 PM, Hervé Pagès wrote: > > Hi Martin, > > > > On 06/14/2012 06:55 AM, Martin Preusse wrote: > > > Hi guys, > > > > > > anything new on the sequence output? Maybe I missed something :) > > > please tell me if you need testing etc. > > > > > > > > Still on my list. Will work on this in the next couple of weeks. I'll > > let you know. Thanks for the reminder. > > > > There is now a writePairwiseAlignments() function (in Biostrings 2.25.8) > for doing this. It produces a file in the "pair" format (as described on > the EMBOSS website, at the URL you sent earlier): > > > library(Biostrings) > > pattern <- DNAString("CGTACGTAACGTTCGT") > > subject <- DNAString("CGTCGTCGTCCGTAA") > > x1 <- pairwiseAlignment(pattern, subject) > > > > x1 > Global PairwiseAlignmentsSingleSubject (1 of 1) > pattern: [1] CGTACGTAACGTTCGT > subject: [1] CGT-CGT--CGTCCGT > score: -32.11822 > > > writePairwiseAlignments(x1, block.width=10) # 'block.width' > default is 50 > ######################################## > # Program: Biostrings (version 2.25.8), a Bioconductor package > # Rundate: Wed Jul 18 23:05:46 2012 > ######################################## > #======================================= > # > # Aligned_sequences: 2 > # 1: P1 > # 2: S1 > # Matrix: NA > # Gap_penalty: 14.0 > # Extend_penalty: 4.0 > # > # Length: 18 > # Identity: 12/18 (66.7%) > # Similarity: NA/18 (NA%) > # Gaps: 5/18 (27.8%) > # Score: -32.11822 > # > # > #======================================= > > P1 1 CGTACGTAAC 10 > ||| ||| | > S1 1 CGT-CGT--C 7 > > P1 11 GTTCGT-- 16 > || ||| > S1 8 GTCCGTAA 15 > > > #--------------------------------------- > #--------------------------------------- > > > Only lightly tested. Not necessarily very performant (no C code). Please > have a look at the man page for some caveats (especially if you plan to > use it on NON global alignments). Feedback welcome. > > Thanks, > H. > > > > > H. > > > > > > > > Cheers > > > Martin > > > > > > > > > Am Samstag, 21. April 2012 um 11:55 schrieb Martin Preusse: > > > > > > > Hi Herv?, > > > > > > > > thanks for your help! If you need suggestions, help or testing, just > > > > say the word. > > > > > > > > Will you implement the header also? If you do so, I would be thankful > > > > for an option like "header=F" for the output. > > > > > > > > > > > > Cheers > > > > Martin > > > > > > > > > > > > Am Samstag, 21. April 2012 um 02:12 schrieb Hervé Pagès: > > > > > > > > > Thanks Martin and Thomas for the useful feedback. The 'pair' and > > > > > 'markx0' formats supported by Emboss seem indeed appropriate for > > > > > printing the output of pairwiseAlignment() to a file. I'll add > > > > > support for those 2 formats in Biostrings. Won't be before 1 week > > > > > or 2 though... > > > > > > > > > > Cheers, > > > > > H. > > > > > > > > > > On 04/18/2012 03:20 AM, Martin Preusse wrote: > > > > > > Hi, > > > > > > > > > > > > I just found this function to print a pairwise alignments in > > > > > > blocks. Doesn't add the match/mismatch indicators between > > > > > > sequences, but might be a starting point: > > > > > > > > > > > > http://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html#viewing-a -long-pairwise-alignment > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > Am Mittwoch, 18. April 2012 um 12:16 schrieb Martin Preusse: > > > > > > > > > > > > > Hi everybody, > > > > > > > > > > > > > > I think the output format depends on the purpose of the alignment. > > > > > > > > > > > > > > A pairwise sequence alignment is usually done to compare two > > > > > > > sequences base by base. In my case, I compare sequencing results > > > > > > > of cloned expression constructs with the desired sequence. Thus, > > > > > > > the best output format would be "BLAST like". > > > > > > > > > > > > > > seq1: 1 ATCTGC 7 > > > > > > > | | | . . | > > > > > > > seq2: 1 ATCAAC 7 > > > > > > > > > > > > > > When doing MSA, most people might rather be interested in the > > > > > > > consensus sequence. E.g. in the context of conservation between > > > > > > > species. > > > > > > > > > > > > > > So write.PairwiseAlignedXStringSet() and write.MultipleAlignment() > > > > > > > are quite different and BLAST doesn't make much sense for multiple > > > > > > > alignments. This means it would be best to put the output in the > > > > > > > PairwiseAlignment/MultipleAlignment and not to the XStringSet, right? > > > > > > > > > > > > > > This is an overview of sequence alignment formats used by EMBOSS: > > > > > > > http://emboss.sourceforge.net/docs/themes/AlignFormats.html > > > > > > > > > > > > > > 'pair' or 'markx0' would be perfectly fine. > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke: > > > > > > > > > > > > > > > Hi Herv?, > > > > > > > > > > > > > > > > To me, the most basic and versatile MSA or pairwise alignment > > > > > > > > format to output > > > > > > > > to would be FASTA since it is compatible with almost any other > > > > > > > > alignment > > > > > > > > editing software. For text-based viewing purposes my preference > > > > > > > > would be > > > > > > > > to also output to a format similar to the one shown in the following > > > > > > > > example. When there are only two sequences then one could show > > > > > > > > instead > > > > > > > > of a consensus line the pipe characters between the two sequences to > > > > > > > > indicate identical residues which mimics the blast output. A more > > > > > > > > standardized version of this pairwise alignment format can be found > > > > > > > > here: > > > > > > > > http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html > > > > > > > > > > > > > > > > library(Biostrings) > > > > > > > > p450<- > > > > > > > > read.AAStringSet("http://faculty.ucr.edu/~tgirke/Docum ents/R_BioCond/Samples/p450.mul", > > > > > > > > "fasta") > > > > > > > > > > > > > > > > StringSet2html<- function(msa=p450, file="p450.html", start=1, > > > > > > > > end=length(p450[[1]]), counter=20, browser=TRUE, ...) { > > > > > > > > if(class(msa)=="AAStringSet") msa<- AAStringSet(msa, start=start, > > > > > > > > end=end) > > > > > > > > if(class(msa)=="DNAStringSet") msa<- DNAStringSet(msa, > > > > > > > > start=start, end=end) > > > > > > > > msavec<- sapply(msa, toString) > > > > > > > > offset<- (counter-1)-nchar(nchar(msavec[1])) > > > > > > > > legend<- paste(paste(paste(paste(rep(" ", offset), collapse=""), > > > > > > > > format(seq(0, > > > > > > > > nchar(msavec[1]), by=counter)[-1])), collapse=""), collapse="") > > > > > > > > consensus<- consensusString(msavec, ambiguityMap=".", ...) > > > > > > > > msavec<- paste(msavec, rowSums(as.matrix(msa) != "-"), sep=" ") > > > > > > > > msavec<- paste(format(c("", names(msa), "Consensus"), > > > > > > > > justify="left"), c(legend, msavec, > > > > > > > > consensus), sep=" ") > > > > > > > > msavec<- c("<html> ", msavec," </html>") > > > > > > > > writeLines(msavec, file) > > > > > > > > if(browser==TRUE) { browseURL(file) } > > > > > > > > } > > > > > > > > StringSet2html(msa=p450, file="p450.html", start=1, > > > > > > > > end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) > > > > > > > > StringSet2html(msa=p450, file="p450.html", start=450, end=470, > > > > > > > > counter=20, browser=T, threshold=1.0) > > > > > > > > > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: > > > > > > > > > Hi Thomas, > > > > > > > > > > > > > > > > > > On 04/17/2012 11:49 AM, Thomas Girke wrote: > > > > > > > > > > What about providing an option in pairwiseAlignment to output > > > > > > > > > > to the > > > > > > > > > > MultipleAlignment class in Biostrings and then write the latter to > > > > > > > > > > different alignment formats? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or we could provide coercion methods to switch between > > > > > > > > > PairwiseAlignedXStringSet and MultipleAlignment. > > > > > > > > > > > > > > > > > > Anyway that kind of moves Martin's problem from having a > > > > > > > > > write.PairwiseAlignedXStringSet() function that produces BLAST > > > > > > > > > output > > > > > > > > > to having a write.MultipleAlignment() function that produces BLAST > > > > > > > > > output. For the specific case of BLAST output, would it make sense > > > > > > > > > to support it for MultipleAlignment? Can someone point me to an > > > > > > > > > example > > > > > > > > > of such output? Or even better, to the specs of such format? > > > > > > > > > > > > > > > > > > Note that right now there is the write.phylip() function in > > > > > > > > > Biostrings > > > > > > > > > for writing a MultipleAlignment object to a file but the Phylip > > > > > > > > > format > > > > > > > > > looks very different from the BLAST output: > > > > > > > > > > > > > > > > > > hpages at latitude:~ head -n 20 phylip_test.txt > > > > > > > > > 9 2343 > > > > > > > > > Mask 0000000000 0000000000 0000000000 0000000000 0000000000 > > > > > > > > > Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG > > > > > > > > > Chimp ---------- ---------- ---------- ---------- ---------- > > > > > > > > > Cow ---------- ---------- ---------- ---------- ---------- > > > > > > > > > Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG > > > > > > > > > Rat ---------- ---------- ---------- ---------- ---------- > > > > > > > > > Dog ---------- ---------- ---------- ---------- ---------- > > > > > > > > > Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC > > > > > > > > > Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG > > > > > > > > > > > > > > > > > > 0000000000 0000000000 0000000000 0001111111 1111111111 > > > > > > > > > CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA- TGGCT > > > > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > > > > ---------- ---------- ---------- ---GAGAGAA GTCA- TGGCT > > > > > > > > > CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA- TGGCT > > > > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > > > > AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC- CAGCC > > > > > > > > > CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > H. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: > > > > > > > > > > > Hi Martin, > > > > > > > > > > > > > > > > > > > > > > On 04/16/2012 04:06 AM, Martin Preusse wrote: > > > > > > > > > > > > Hi Charles, > > > > > > > > > > > > > > > > > > > > > > > > thanks! Your solution allows to print the two alignment > > > > > > > > > > > > strings separately. > > > > > > > > > > > > > > > > > > > > > > > > I was thinking of an output as generated by alignment tools: > > > > > > > > > > > > > > > > > > > > > > > > AGT-TCTAT > > > > > > > > > > > > | | | | | | | | | > > > > > > > > > > > > AGTATCTAT > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This looks like BLAST output. Is this what you have in mind? > > > > > > > > > > > Note that > > > > > > > > > > > there are many alignment tools and many ways to output the > > > > > > > > > > > result to a > > > > > > > > > > > file. I'm not really familiar with the BLAST output format. Is it > > > > > > > > > > > specified somewhere? Would that make sense to add something > > > > > > > > > > > like a > > > > > > > > > > > write.PairwiseAlignedXStringSet() function to Biostrings for > > > > > > > > > > > writing > > > > > > > > > > > the result of pairwiseAlignment() to a file? We could do this and > > > > > > > > > > > support the BLAST format if that's a commonly used format. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > H. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For this I would have to write a function to output the > > > > > > > > > > > > strings in blocks of e.g. 60 nucleotides, right? > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > > > > > > > > > > > > > > > > > > > > > > > > > write.XStringSet > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > Bioconductor mailing list > > > > > > > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > > Search the archives: > > > > > > > > > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Hervé Pagès > > > > > > > > > > > > > > > > > > > > > > Program in Computational Biology > > > > > > > > > > > Division of Public Health Sciences > > > > > > > > > > > Fred Hutchinson Cancer Research Center > > > > > > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > > > > > > P.O. Box 19024 > > > > > > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > > > > > > Phone: (206) 667-5791 > > > > > > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > Bioconductor mailing list > > > > > > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > Search the archives: > > > > > > > > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Hervé Pagès > > > > > > > > > > > > > > > > > > Program in Computational Biology > > > > > > > > > Division of Public Health Sciences > > > > > > > > > Fred Hutchinson Cancer Research Center > > > > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > > > > P.O. Box 19024 > > > > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > > > > Phone: (206) 667-5791 > > > > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Hervé Pagès > > > > > > > > > > Program in Computational Biology > > > > > Division of Public Health Sciences > > > > > Fred Hutchinson Cancer Research Center > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > P.O. Box 19024 > > > > > Seattle, WA 98109-1024 > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > Phone: (206) 667-5791 > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > Phone: (206) 667-5791 > Fax: (206) 667-1319
0
Entering edit mode
", msavec,"
</html>") >>>>>>>>> writeLines(msavec, file) >>>>>>>>> if(browser==TRUE) { browseURL(file) } >>>>>>>>> } >>>>>>>>> StringSet2html(msa=p450, file="p450.html", start=1, >>>>>>>>> end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) >>>>>>>>> StringSet2html(msa=p450, file="p450.html", start=450, end=470, >>>>>>>>> counter=20, browser=T, threshold=1.0) >>>>>>>>> >>>>>>>>> >>>>>>>>> Thomas >>>>>>>>> >>>>>>>>> On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: >>>>>>>>>> Hi Thomas, >>>>>>>>>> >>>>>>>>>> On 04/17/2012 11:49 AM, Thomas Girke wrote: >>>>>>>>>>> What about providing an option in pairwiseAlignment to output >>>>>>>>>>> to the >>>>>>>>>>> MultipleAlignment class in Biostrings and then write the latter to >>>>>>>>>>> different alignment formats? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Or we could provide coercion methods to switch between >>>>>>>>>> PairwiseAlignedXStringSet and MultipleAlignment. >>>>>>>>>> >>>>>>>>>> Anyway that kind of moves Martin's problem from having a >>>>>>>>>> write.PairwiseAlignedXStringSet() function that produces BLAST >>>>>>>>>> output >>>>>>>>>> to having a write.MultipleAlignment() function that produces BLAST >>>>>>>>>> output. For the specific case of BLAST output, would it make sense >>>>>>>>>> to support it for MultipleAlignment? Can someone point me to an >>>>>>>>>> example >>>>>>>>>> of such output? Or even better, to the specs of such format? >>>>>>>>>> >>>>>>>>>> Note that right now there is the write.phylip() function in >>>>>>>>>> Biostrings >>>>>>>>>> for writing a MultipleAlignment object to a file but the Phylip >>>>>>>>>> format >>>>>>>>>> looks very different from the BLAST output: >>>>>>>>>> >>>>>>>>>> hpages at latitude:~head -n 20 phylip_test.txt >>>>>>>>>> 9 2343 >>>>>>>>>> Mask 0000000000 0000000000 0000000000 0000000000 0000000000 >>>>>>>>>> Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG >>>>>>>>>> Chimp ---------- ---------- ---------- ---------- ---------- >>>>>>>>>> Cow ---------- ---------- ---------- ---------- ---------- >>>>>>>>>> Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG >>>>>>>>>> Rat ---------- ---------- ---------- ---------- ---------- >>>>>>>>>> Dog ---------- ---------- ---------- ---------- ---------- >>>>>>>>>> Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC >>>>>>>>>> Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG >>>>>>>>>> >>>>>>>>>> 0000000000 0000000000 0000000000 0001111111 1111111111 >>>>>>>>>> CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT >>>>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>>>> ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT >>>>>>>>>> CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT >>>>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>>>> ---------- ---------- ---------- ---------- ---A-TGGCT >>>>>>>>>> AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC >>>>>>>>>> CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> H. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thomas >>>>>>>>>>> >>>>>>>>>>> On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: >>>>>>>>>>>> Hi Martin, >>>>>>>>>>>> >>>>>>>>>>>> On 04/16/2012 04:06 AM, Martin Preusse wrote: >>>>>>>>>>>>> Hi Charles, >>>>>>>>>>>>> >>>>>>>>>>>>> thanks! Your solution allows to print the two alignment >>>>>>>>>>>>> strings separately. >>>>>>>>>>>>> >>>>>>>>>>>>> I was thinking of an output as generated by alignment tools: >>>>>>>>>>>>> >>>>>>>>>>>>> AGT-TCTAT >>>>>>>>>>>>> | | | | | | | | | >>>>>>>>>>>>> AGTATCTAT >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This looks like BLAST output. Is this what you have in mind? >>>>>>>>>>>> Note that >>>>>>>>>>>> there are many alignment tools and many ways to output the >>>>>>>>>>>> result to a >>>>>>>>>>>> file. I'm not really familiar with the BLAST output format. Is it >>>>>>>>>>>> specified somewhere? Would that make sense to add something >>>>>>>>>>>> like a >>>>>>>>>>>> write.PairwiseAlignedXStringSet() function to Biostrings for >>>>>>>>>>>> writing >>>>>>>>>>>> the result of pairwiseAlignment() to a file? We could do this and >>>>>>>>>>>> support the BLAST format if that's a commonly used format. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> H. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> For this I would have to write a function to output the >>>>>>>>>>>>> strings in blocks of e.g. 60 nucleotides, right? >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers >>>>>>>>>>>>> Martin >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: >>>>>>>>>>>>> >>>>>>>>>>>>>> write.XStringSet >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>>>> Search the archives: >>>>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Hervé Pagès >>>>>>>>>>>> >>>>>>>>>>>> Program in Computational Biology >>>>>>>>>>>> Division of Public Health Sciences >>>>>>>>>>>> Fred Hutchinson Cancer Research Center >>>>>>>>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>>>>>>>> P.O. Box 19024 >>>>>>>>>>>> Seattle, WA 98109-1024 >>>>>>>>>>>> >>>>>>>>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>>>>>>>> Phone: (206) 667-5791 >>>>>>>>>>>> Fax: (206) 667-1319 >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>>> Search the archives: >>>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Hervé Pagès >>>>>>>>>> >>>>>>>>>> Program in Computational Biology >>>>>>>>>> Division of Public Health Sciences >>>>>>>>>> Fred Hutchinson Cancer Research Center >>>>>>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>>>>>> P.O. Box 19024 >>>>>>>>>> Seattle, WA 98109-1024 >>>>>>>>>> >>>>>>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>>>>>> Phone: (206) 667-5791 >>>>>>>>>> Fax: (206) 667-1319 >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Hervé Pagès >>>>>> >>>>>> Program in Computational Biology >>>>>> Division of Public Health Sciences >>>>>> Fred Hutchinson Cancer Research Center >>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>> P.O. Box 19024 >>>>>> Seattle, WA 98109-1024 >>>>>> >>>>>> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >>>>>> Phone: (206) 667-5791 >>>>>> Fax: (206) 667-1319 >>>>> >>>> >>> >> >> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 > > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319 ADD REPLY 0 Entering edit mode Hi everyone, Could you please advise how to boxplot a big list (~2million *70) into a small pdf file? I did it using R boxplot and it generated a pdf file in >20MB that is hard to open. Any suggestions will be appreciated. Best, John [[alternative HTML version deleted]] ADD REPLY 0 Entering edit mode Hi John, On 7/19/2012 3:45 PM, John linux-user wrote: > Hi everyone, > > Could you please advise how to boxplot a big list (~2million *70) into a small pdf file? I did it using R boxplot and it generated a pdf file in>20MB that is hard to open. Any suggestions will be appreciated. There are two things that come to my mind. First, do you really need to use all those data? I would bet that a sample of say 1e5 from each list item would give essentially the same picture. Alternatively, you might consider using a bitmapped graphic rather than a vector graphic. I am assuming here that you have lots of outlier points (outside the whiskers) that are slowing things down. You could use a png, and as long as you aren't resizing it should look OK. Best, Jim > > Best, > > John > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 ADD REPLY 0 Entering edit mode If the problem is tons of outliers, you could consider setting range=0, which will cause the full range of data to be included in the whiskers... Janet Smith Research Scientist Department of Biology MIT Building 68-540D Cambridge, MA 02139 ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of John linux-user [johnlinuxuser@yahoo.com] Sent: Thursday, July 19, 2012 3:45 PM To: bioconductor at r-project.org Subject: [BioC] boxplot a big list? Hi everyone, Could you please advise how to boxplot a big list (~2million *70) into a small pdf file? I did it using R boxplot and it generated a pdf file in >20MB that is hard to open. Any suggestions will be appreciated. Best, John [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ADD REPLY 0 Entering edit mode I installed 2.25.8 and it seems to work fine. I'll test it with my scripts that process sequencing data and report any problems. Am Donnerstag, 19. Juli 2012 um 19:23 schrieb Hervé Pagès: > Hi Martin, > > On 07/19/2012 04:30 AM, Martin Preusse wrote: > > Hi Herv?, > > > > thanks! This looks great. I have a question though: How can I install the latest version of Biostrings? > > > > When I install with: > > > > source("http://bioconductor.org/biocLite.R") > > biocLite("Biostrings") > > > > > > > > I get Version 2.22.0 ? ;) > > Biostrings 2.25 is part of the current devel version of BioC > (BioC 2.11). To install BioC devel, see this link: > > http://bioconductor.org/developers/useDevel/ > > Note that today Biostrings 2.25.7 is the latest version available thru > biocLite(): > > http://bioconductor.org/packages/2.11/bioc/html/Biostrings.html > > If everything goes well, 2.25.8 will become available tomorrow morning > around 10 am (Seattle time). Please let me know if you run into any > issue with this. > > Thanks! > H. > > > > > > > Cheers > > Martin > > > > > > > > Am Donnerstag, 19. Juli 2012 um 08:10 schrieb Hervé Pagès: > > > > > Hi Martin, > > > > > > On 06/22/2012 12:09 PM, Hervé Pagès wrote: > > > > Hi Martin, > > > > > > > > On 06/14/2012 06:55 AM, Martin Preusse wrote: > > > > > Hi guys, > > > > > > > > > > anything new on the sequence output? Maybe I missed something :) > > > > > please tell me if you need testing etc. > > > > > > > > > > > > > > > > > > > > > > > > Still on my list. Will work on this in the next couple of weeks. I'll > > > > let you know. Thanks for the reminder. > > > > > > > > > > > > > > > > > > There is now a writePairwiseAlignments() function (in Biostrings 2.25.8) > > > for doing this. It produces a file in the "pair" format (as described on > > > the EMBOSS website, at the URL you sent earlier): > > > > > > > library(Biostrings) > > > > pattern <- DNAString("CGTACGTAACGTTCGT") > > > > subject <- DNAString("CGTCGTCGTCCGTAA") > > > > x1 <- pairwiseAlignment(pattern, subject) > > > > > > > > > > > > > > > > x1 > > > Global PairwiseAlignmentsSingleSubject (1 of 1) > > > pattern: [1] CGTACGTAACGTTCGT > > > subject: [1] CGT-CGT--CGTCCGT > > > score: -32.11822 > > > > > > > writePairwiseAlignments(x1, block.width=10) # 'block.width' > > > default is 50 > > > ######################################## > > > # Program: Biostrings (version 2.25.8), a Bioconductor package > > > # Rundate: Wed Jul 18 23:05:46 2012 > > > ######################################## > > > #======================================= > > > # > > > # Aligned_sequences: 2 > > > # 1: P1 > > > # 2: S1 > > > # Matrix: NA > > > # Gap_penalty: 14.0 > > > # Extend_penalty: 4.0 > > > # > > > # Length: 18 > > > # Identity: 12/18 (66.7%) > > > # Similarity: NA/18 (NA%) > > > # Gaps: 5/18 (27.8%) > > > # Score: -32.11822 > > > # > > > # > > > #======================================= > > > > > > P1 1 CGTACGTAAC 10 > > > ||| ||| | > > > S1 1 CGT-CGT--C 7 > > > > > > P1 11 GTTCGT-- 16 > > > || ||| > > > S1 8 GTCCGTAA 15 > > > > > > > > > #--------------------------------------- > > > #--------------------------------------- > > > > > > > > > Only lightly tested. Not necessarily very performant (no C code). Please > > > have a look at the man page for some caveats (especially if you plan to > > > use it on NON global alignments). Feedback welcome. > > > > > > Thanks, > > > H. > > > > > > > > > > > H. > > > > > > > > > > > > > > Cheers > > > > > Martin > > > > > > > > > > > > > > > Am Samstag, 21. April 2012 um 11:55 schrieb Martin Preusse: > > > > > > > > > > > Hi Herv?, > > > > > > > > > > > > thanks for your help! If you need suggestions, help or testing, just > > > > > > say the word. > > > > > > > > > > > > Will you implement the header also? If you do so, I would be thankful > > > > > > for an option like "header=F" for the output. > > > > > > > > > > > > > > > > > > Cheers > > > > > > Martin > > > > > > > > > > > > > > > > > > Am Samstag, 21. April 2012 um 02:12 schrieb Hervé Pagès: > > > > > > > > > > > > > Thanks Martin and Thomas for the useful feedback. The 'pair' and > > > > > > > 'markx0' formats supported by Emboss seem indeed appropriate for > > > > > > > printing the output of pairwiseAlignment() to a file. I'll add > > > > > > > support for those 2 formats in Biostrings. Won't be before 1 week > > > > > > > or 2 though... > > > > > > > > > > > > > > Cheers, > > > > > > > H. > > > > > > > > > > > > > > On 04/18/2012 03:20 AM, Martin Preusse wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > I just found this function to print a pairwise alignments in > > > > > > > > blocks. Doesn't add the match/mismatch indicators between > > > > > > > > sequences, but might be a starting point: > > > > > > > > > > > > > > > > http://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html#viewing-a -long-pairwise-alignment > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am Mittwoch, 18. April 2012 um 12:16 schrieb Martin Preusse: > > > > > > > > > > > > > > > > > Hi everybody, > > > > > > > > > > > > > > > > > > I think the output format depends on the purpose of the alignment. > > > > > > > > > > > > > > > > > > A pairwise sequence alignment is usually done to compare two > > > > > > > > > sequences base by base. In my case, I compare sequencing results > > > > > > > > > of cloned expression constructs with the desired sequence. Thus, > > > > > > > > > the best output format would be "BLAST like". > > > > > > > > > > > > > > > > > > seq1: 1 ATCTGC 7 > > > > > > > > > | | | . . | > > > > > > > > > seq2: 1 ATCAAC 7 > > > > > > > > > > > > > > > > > > When doing MSA, most people might rather be interested in the > > > > > > > > > consensus sequence. E.g. in the context of conservation between > > > > > > > > > species. > > > > > > > > > > > > > > > > > > So write.PairwiseAlignedXStringSet() and write.MultipleAlignment() > > > > > > > > > are quite different and BLAST doesn't make much sense for multiple > > > > > > > > > alignments. This means it would be best to put the output in the > > > > > > > > > PairwiseAlignment/MultipleAlignment and not to the XStringSet, right? > > > > > > > > > > > > > > > > > > This is an overview of sequence alignment formats used by EMBOSS: > > > > > > > > > http://emboss.sourceforge.net/docs/themes/AlignFormats.html > > > > > > > > > > > > > > > > > > 'pair' or 'markx0' would be perfectly fine. > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke: > > > > > > > > > > > > > > > > > > > Hi Herv?, > > > > > > > > > > > > > > > > > > > > To me, the most basic and versatile MSA or pairwise alignment > > > > > > > > > > format to output > > > > > > > > > > to would be FASTA since it is compatible with almost any other > > > > > > > > > > alignment > > > > > > > > > > editing software. For text-based viewing purposes my preference > > > > > > > > > > would be > > > > > > > > > > to also output to a format similar to the one shown in the following > > > > > > > > > > example. When there are only two sequences then one could show > > > > > > > > > > instead > > > > > > > > > > of a consensus line the pipe characters between the two sequences to > > > > > > > > > > indicate identical residues which mimics the blast output. A more > > > > > > > > > > standardized version of this pairwise alignment format can be found > > > > > > > > > > here: > > > > > > > > > > http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html > > > > > > > > > > > > > > > > > > > > library(Biostrings) > > > > > > > > > > p450<- > > > > > > > > > > read.AAStringSet("http://faculty.ucr.edu/~tgirke/D ocuments/R_BioCond/Samples/p450.mul", > > > > > > > > > > "fasta") > > > > > > > > > > > > > > > > > > > > StringSet2html<- function(msa=p450, file="p450.html", start=1, > > > > > > > > > > end=length(p450[[1]]), counter=20, browser=TRUE, ...) { > > > > > > > > > > if(class(msa)=="AAStringSet") msa<- AAStringSet(msa, start=start, > > > > > > > > > > end=end) > > > > > > > > > > if(class(msa)=="DNAStringSet") msa<- DNAStringSet(msa, > > > > > > > > > > start=start, end=end) > > > > > > > > > > msavec<- sapply(msa, toString) > > > > > > > > > > offset<- (counter-1)-nchar(nchar(msavec[1])) > > > > > > > > > > legend<- paste(paste(paste(paste(rep(" ", offset), collapse=""), > > > > > > > > > > format(seq(0, > > > > > > > > > > nchar(msavec[1]), by=counter)[-1])), collapse=""), collapse="") > > > > > > > > > > consensus<- consensusString(msavec, ambiguityMap=".", ...) > > > > > > > > > > msavec<- paste(msavec, rowSums(as.matrix(msa) != "-"), sep=" ") > > > > > > > > > > msavec<- paste(format(c("", names(msa), "Consensus"), > > > > > > > > > > justify="left"), c(legend, msavec, > > > > > > > > > > consensus), sep=" ") > > > > > > > > > > msavec<- c("<html> ", msavec," </html>") > > > > > > > > > > writeLines(msavec, file) > > > > > > > > > > if(browser==TRUE) { browseURL(file) } > > > > > > > > > > } > > > > > > > > > > StringSet2html(msa=p450, file="p450.html", start=1, > > > > > > > > > > end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) > > > > > > > > > > StringSet2html(msa=p450, file="p450.html", start=450, end=470, > > > > > > > > > > counter=20, browser=T, threshold=1.0) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: > > > > > > > > > > > Hi Thomas, > > > > > > > > > > > > > > > > > > > > > > On 04/17/2012 11:49 AM, Thomas Girke wrote: > > > > > > > > > > > > What about providing an option in pairwiseAlignment to output > > > > > > > > > > > > to the > > > > > > > > > > > > MultipleAlignment class in Biostrings and then write the latter to > > > > > > > > > > > > different alignment formats? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or we could provide coercion methods to switch between > > > > > > > > > > > PairwiseAlignedXStringSet and MultipleAlignment. > > > > > > > > > > > > > > > > > > > > > > Anyway that kind of moves Martin's problem from having a > > > > > > > > > > > write.PairwiseAlignedXStringSet() function that produces BLAST > > > > > > > > > > > output > > > > > > > > > > > to having a write.MultipleAlignment() function that produces BLAST > > > > > > > > > > > output. For the specific case of BLAST output, would it make sense > > > > > > > > > > > to support it for MultipleAlignment? Can someone point me to an > > > > > > > > > > > example > > > > > > > > > > > of such output? Or even better, to the specs of such format? > > > > > > > > > > > > > > > > > > > > > > Note that right now there is the write.phylip() function in > > > > > > > > > > > Biostrings > > > > > > > > > > > for writing a MultipleAlignment object to a file but the Phylip > > > > > > > > > > > format > > > > > > > > > > > looks very different from the BLAST output: > > > > > > > > > > > > > > > > > > > > > > hpages at latitude:~ head -n 20 phylip_test.txt > > > > > > > > > > > 9 2343 > > > > > > > > > > > Mask 0000000000 0000000000 0000000000 0000000000 0000000000 > > > > > > > > > > > Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG > > > > > > > > > > > Chimp ---------- ---------- ---------- ---------- ---------- > > > > > > > > > > > Cow ---------- ---------- ---------- ---------- ---------- > > > > > > > > > > > Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG > > > > > > > > > > > Rat ---------- ---------- ---------- ---------- ---------- > > > > > > > > > > > Dog ---------- ---------- ---------- ---------- ---------- > > > > > > > > > > > Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC > > > > > > > > > > > Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG > > > > > > > > > > > > > > > > > > > > > > 0000000000 0000000000 0000000000 0001111111 1111111111 > > > > > > > > > > > CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT > > > > > > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > > > > > > ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT > > > > > > > > > > > CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT > > > > > > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > > > > > > AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC > > > > > > > > > > > CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > H. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: > > > > > > > > > > > > > Hi Martin, > > > > > > > > > > > > > > > > > > > > > > > > > > On 04/16/2012 04:06 AM, Martin Preusse wrote: > > > > > > > > > > > > > > Hi Charles, > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks! Your solution allows to print the two alignment > > > > > > > > > > > > > > strings separately. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I was thinking of an output as generated by alignment tools: > > > > > > > > > > > > > > > > > > > > > > > > > > > > AGT-TCTAT > > > > > > > > > > > > > > | | | | | | | | | > > > > > > > > > > > > > > AGTATCTAT > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This looks like BLAST output. Is this what you have in mind? > > > > > > > > > > > > > Note that > > > > > > > > > > > > > there are many alignment tools and many ways to output the > > > > > > > > > > > > > result to a > > > > > > > > > > > > > file. I'm not really familiar with the BLAST output format. Is it > > > > > > > > > > > > > specified somewhere? Would that make sense to add something > > > > > > > > > > > > > like a > > > > > > > > > > > > > write.PairwiseAlignedXStringSet() function to Biostrings for > > > > > > > > > > > > > writing > > > > > > > > > > > > > the result of pairwiseAlignment() to a file? We could do this and > > > > > > > > > > > > > support the BLAST format if that's a commonly used format. > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > H. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For this I would have to write a function to output the > > > > > > > > > > > > > > strings in blocks of e.g. 60 nucleotides, right? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > write.XStringSet > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > > Bioconductor mailing list > > > > > > > > > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > > > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > > > > Search the archives: > > > > > > > > > > > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > Hervé Pagès > > > > > > > > > > > > > > > > > > > > > > > > > > Program in Computational Biology > > > > > > > > > > > > > Division of Public Health Sciences > > > > > > > > > > > > > Fred Hutchinson Cancer Research Center > > > > > > > > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > > > > > > > > P.O. Box 19024 > > > > > > > > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > > > > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > > > > > > > > Phone: (206) 667-5791 > > > > > > > > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > Bioconductor mailing list > > > > > > > > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > > > Search the archives: > > > > > > > > > > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Hervé Pagès > > > > > > > > > > > > > > > > > > > > > > Program in Computational Biology > > > > > > > > > > > Division of Public Health Sciences > > > > > > > > > > > Fred Hutchinson Cancer Research Center > > > > > > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > > > > > > P.O. Box 19024 > > > > > > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > > > > > > Phone: (206) 667-5791 > > > > > > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Hervé Pagès > > > > > > > > > > > > > > Program in Computational Biology > > > > > > > Division of Public Health Sciences > > > > > > > Fred Hutchinson Cancer Research Center > > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > > P.O. Box 19024 > > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > > Phone: (206) 667-5791 > > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Hervé Pagès > > > > > > Program in Computational Biology > > > Division of Public Health Sciences > > > Fred Hutchinson Cancer Research Center > > > 1100 Fairview Ave. N, M1-B514 > > > P.O. Box 19024 > > > Seattle, WA 98109-1024 > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > Phone: (206) 667-5791 > > > Fax: (206) 667-1319 > > > > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > Phone: (206) 667-5791 > Fax: (206) 667-1319