Entering edit mode
Martin Preusse
▴
100
@martin-preusse-5224
Last seen 10.2 years ago
Hi guys,
anything new on the sequence output? Maybe I missed something :)
please tell me if you need testing etc.
Cheers
Martin
Am Samstag, 21. April 2012 um 11:55 schrieb Martin Preusse:
> Hi Herv?,
>
> thanks for your help! If you need suggestions, help or testing, just
say the word.
>
> Will you implement the header also? If you do so, I would be
thankful for an option like "header=F" for the output.
>
>
> Cheers
> Martin
>
>
> Am Samstag, 21. April 2012 um 02:12 schrieb Hervé Pagès:
>
> > Thanks Martin and Thomas for the useful feedback. The 'pair' and
> > 'markx0' formats supported by Emboss seem indeed appropriate for
> > printing the output of pairwiseAlignment() to a file. I'll add
> > support for those 2 formats in Biostrings. Won't be before 1 week
> > or 2 though...
> >
> > Cheers,
> > H.
> >
> > On 04/18/2012 03:20 AM, Martin Preusse wrote:
> > > Hi,
> > >
> > > I just found this function to print a pairwise alignments in
blocks. Doesn't add the match/mismatch indicators between sequences,
but might be a starting point:
> > >
> > > http://a-little-book-of-r-for-
bioinformatics.readthedocs.org/en/latest/src/chapter4.html#viewing-a
-long-pairwise-alignment
> > >
> > >
> > > Cheers
> > > Martin
> > >
> > >
> > >
> > > Am Mittwoch, 18. April 2012 um 12:16 schrieb Martin Preusse:
> > >
> > > > Hi everybody,
> > > >
> > > > I think the output format depends on the purpose of the
alignment.
> > > >
> > > > A pairwise sequence alignment is usually done to compare two
sequences base by base. In my case, I compare sequencing results of
cloned expression constructs with the desired sequence. Thus, the best
output format would be "BLAST like".
> > > >
> > > > seq1: 1 ATCTGC 7
> > > > | | | . . |
> > > > seq2: 1 ATCAAC 7
> > > >
> > > > When doing MSA, most people might rather be interested in the
consensus sequence. E.g. in the context of conservation between
species.
> > > >
> > > > So write.PairwiseAlignedXStringSet() and
write.MultipleAlignment() are quite different and BLAST doesn't make
much sense for multiple alignments. This means it would be best to put
the output in the PairwiseAlignment/MultipleAlignment and not to the
XStringSet, right?
> > > >
> > > > This is an overview of sequence alignment formats used by
EMBOSS:
> > > > http://emboss.sourceforge.net/docs/themes/AlignFormats.html
> > > >
> > > > 'pair' or 'markx0' would be perfectly fine.
> > > >
> > > >
> > > > Cheers
> > > > Martin
> > > >
> > > >
> > > >
> > > > Am Dienstag, 17. April 2012 um 22:13 schrieb Thomas Girke:
> > > >
> > > > > Hi Herv?,
> > > > >
> > > > > To me, the most basic and versatile MSA or pairwise
alignment format to output
> > > > > to would be FASTA since it is compatible with almost any
other alignment
> > > > > editing software. For text-based viewing purposes my
preference would be
> > > > > to also output to a format similar to the one shown in the
following
> > > > > example. When there are only two sequences then one could
show instead
> > > > > of a consensus line the pipe characters between the two
sequences to
> > > > > indicate identical residues which mimics the blast output. A
more
> > > > > standardized version of this pairwise alignment format can
be found
> > > > > here:
> > > > >
http://emboss.sourceforge.net/apps/cvs/emboss/apps/needle.html
> > > > >
> > > > > library(Biostrings)
> > > > > p450<- read.AAStringSet("http://faculty.ucr.edu/~tgirke/Docu
ments/R_BioCond/Samples/p450.mul", "fasta")
> > > > >
> > > > > StringSet2html<- function(msa=p450, file="p450.html",
start=1, end=length(p450[[1]]), counter=20, browser=TRUE, ...) {
> > > > > if(class(msa)=="AAStringSet") msa<- AAStringSet(msa,
start=start, end=end)
> > > > > if(class(msa)=="DNAStringSet") msa<- DNAStringSet(msa,
start=start, end=end)
> > > > > msavec<- sapply(msa, toString)
> > > > > offset<- (counter-1)-nchar(nchar(msavec[1]))
> > > > > legend<- paste(paste(paste(paste(rep(" ", offset),
collapse=""), format(seq(0,
> > > > > nchar(msavec[1]), by=counter)[-1])), collapse=""),
collapse="")
> > > > > consensus<- consensusString(msavec, ambiguityMap=".", ...)
> > > > > msavec<- paste(msavec, rowSums(as.matrix(msa) != "-"), sep="
")
> > > > > msavec<- paste(format(c("", names(msa), "Consensus"),
justify="left"), c(legend, msavec,
> > > > > consensus), sep=" ")
> > > > > msavec<- c("<html>
", msavec,"</html>") > > > > > writeLines(msavec, file) > > > > > if(browser==TRUE) { browseURL(file) } > > > > > } > > > > > StringSet2html(msa=p450, file="p450.html", start=1, end=length(p450[[1]]), counter=20, browser=T, threshold=1.0) > > > > > StringSet2html(msa=p450, file="p450.html", start=450, end=470, counter=20, browser=T, threshold=1.0) > > > > > > > > > > > > > > > Thomas > > > > > > > > > > On Tue, Apr 17, 2012 at 07:43:30PM +0000, Hervé Pagès wrote: > > > > > > Hi Thomas, > > > > > > > > > > > > On 04/17/2012 11:49 AM, Thomas Girke wrote: > > > > > > > What about providing an option in pairwiseAlignment to output to the > > > > > > > MultipleAlignment class in Biostrings and then write the latter to > > > > > > > different alignment formats? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or we could provide coercion methods to switch between > > > > > > PairwiseAlignedXStringSet and MultipleAlignment. > > > > > > > > > > > > Anyway that kind of moves Martin's problem from having a > > > > > > write.PairwiseAlignedXStringSet() function that produces BLAST output > > > > > > to having a write.MultipleAlignment() function that produces BLAST > > > > > > output. For the specific case of BLAST output, would it make sense > > > > > > to support it for MultipleAlignment? Can someone point me to an example > > > > > > of such output? Or even better, to the specs of such format? > > > > > > > > > > > > Note that right now there is the write.phylip() function in Biostrings > > > > > > for writing a MultipleAlignment object to a file but the Phylip format > > > > > > looks very different from the BLAST output: > > > > > > > > > > > > hpages at latitude:~$ head -n 20 phylip_test.txt > > > > > > 9 2343 > > > > > > Mask 0000000000 0000000000 0000000000 0000000000 0000000000 > > > > > > Human -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG > > > > > > Chimp ---------- ---------- ---------- ---------- ---------- > > > > > > Cow ---------- ---------- ---------- ---------- ---------- > > > > > > Mouse ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG > > > > > > Rat ---------- ---------- ---------- ---------- ---------- > > > > > > Dog ---------- ---------- ---------- ---------- ---------- > > > > > > Chicken ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC > > > > > > Salmon GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG > > > > > > > > > > > > 0000000000 0000000000 0000000000 0001111111 1111111111 > > > > > > CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT > > > > > > CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > ---------- ---------- ---------- ---------- ---A-TGGCT > > > > > > AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC > > > > > > CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT > > > > > > > > > > > > Thanks! > > > > > > H. > > > > > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote: > > > > > > > > Hi Martin, > > > > > > > > > > > > > > > > On 04/16/2012 04:06 AM, Martin Preusse wrote: > > > > > > > > > Hi Charles, > > > > > > > > > > > > > > > > > > thanks! Your solution allows to print the two alignment strings separately. > > > > > > > > > > > > > > > > > > I was thinking of an output as generated by alignment tools: > > > > > > > > > > > > > > > > > > AGT-TCTAT > > > > > > > > > | | | | | | | | | > > > > > > > > > AGTATCTAT > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This looks like BLAST output. Is this what you have in mind? Note that > > > > > > > > there are many alignment tools and many ways to output the result to a > > > > > > > > file. I'm not really familiar with the BLAST output format. Is it > > > > > > > > specified somewhere? Would that make sense to add something like a > > > > > > > > write.PairwiseAlignedXStringSet() function to Biostrings for writing > > > > > > > > the result of pairwiseAlignment() to a file? We could do this and > > > > > > > > support the BLAST format if that's a commonly used format. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > H. > > > > > > > > > > > > > > > > > > > > > > > > > > For this I would have to write a function to output the strings in blocks of e.g. 60 nucleotides, right? > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles: > > > > > > > > > > > > > > > > > > > write.XStringSet > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Bioconductor mailing list > > > > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Hervé Pagès > > > > > > > > > > > > > > > > Program in Computational Biology > > > > > > > > Division of Public Health Sciences > > > > > > > > Fred Hutchinson Cancer Research Center > > > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > > > P.O. Box 19024 > > > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > > > Phone: (206) 667-5791 > > > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Bioconductor mailing list > > > > > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Hervé Pagès > > > > > > > > > > > > Program in Computational Biology > > > > > > Division of Public Health Sciences > > > > > > Fred Hutchinson Cancer Research Center > > > > > > 1100 Fairview Ave. N, M1-B514 > > > > > > P.O. Box 19024 > > > > > > Seattle, WA 98109-1024 > > > > > > > > > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > > > > > Phone: (206) 667-5791 > > > > > > Fax: (206) 667-1319 > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Hervé Pagès > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpages at fhcrc.org (mailto:hpages at fhcrc.org) > > Phone: (206) 667-5791 > > Fax: (206) 667-1319 >