Entering edit mode
Hi Noah,
On 03/02/2012 12:16 PM, Noah Hoffman wrote:
> Hello Herve,
>
> It has been a while... I hope you're doing well. I have a question
about the implementation of needleman-wunsch in pairwiseAlignment. We
have some ion torrent sequencing data. Like the 454, (except worse,
apparently) it likes to insert runs of homopolymers. The alignments
with the primers, etc would be improved considerably if we could
specify a lower penalty for a gap extension when there is a
homopolymeric insertion in the opposite strand (as opposed to an
insertion that is not a homopolymer). That is, two different gap
extension penalties. Is this possible in pairwiseAlignment?
I guess that would be technically possible even though certainly not
straightforward. The Needleman-Wunsch / Smith-Waterman algos are
complicated algorithms and the current implementation in Biostrings
is from Patrick Aboyoun. Unfortunately I'm not familiar enough with
it to be able to tell exactly how hard it would be to add something
like the "homopolymeric insertion penalty" feature you propose but
my gut feeling is that it would probably require some significant
amount of work.
Just to clarify, why are you saying "in the opposite strand" when you
say:
specify a lower penalty for a gap extension when there is a
homopolymeric insertion in the opposite strand
Could you provide an example?
One workaround you could explore is to use a 2-pass algo (UNTESTED,
and I'm not sure it would do exactly what you want):
1st pass: Use pairwiseAlignment() with a gap penalty as low as the
penalty you'd like to put on homopolymeric insertions. You said
you were using Needleman-Wunsch so I'm assuming that you are doing
"global-global" alignment so you probably don't need to work around
one limitation of pairwiseAlignment() when doing "global-local"
which
is that it only returns the first best match.
Let's assume you managed to obtain 1 alignment for each read with a
score.
2nd pass: Look at each alignment and in particular at the
insertions
they contain. Adjust the score of the alignment based on the type
of insertions it has (i.e. homopolymeric or not). Patrick
implemented
tools for dissecting the alignments returned by
pairwiseAlignment():
they are described in the man page for PairwiseAlignedXStringSet
objects (?PairwiseAlignedXStringSet).
>
> Another thing that would be useful would be per-strand gap creation
and gap extension penalties.
Can you please elaborate on this? Depending on what you are trying to
do
exactly, there might be an easy way to go or not. Please provide an
example.
>
> Any ideas? Fred Ross (who you may not have met yet) is working on
this and would like to hear your answer as well.
I'm cc'ing the Bioconductor mailing list so other people can give some
advice. Maybe someone knows a tool in another BioC package or out of
Bioconductor that already provide those features?
Cheers,
H.
>
> Thanks a lot,
> Noah
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319