Is there a way to disable penalties for gaps inserted at the ends of alignments such that the issue below cannot arise:
Biostrings::pairwiseAlignment("GTATATATAGCCTTAGGTTAATTAATTAATTAA",
"GCCTTAGGTTAATTAATTAATTAAGGGGG", type = "global", gapOpening = 0)
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: GTATATATAGCCTTAGGTTAATTAATTAATTAA-----
subject: G---------CCTTAGGTTAATTAATTAATTAAGGGGG
score: -8.437853
Please note how the first G in subject is aligned with first G in pattern instead with the G in the middle.
Biostrings::pairwiseAlignment("GTATATATAGCCTTAGGTTAATTAATTAATTAA",
"GCCTTAGGTTAATTAATTAATTAAGGGGG", type = "global", gapOpening = 22)
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: GTATATATAGCCTTAGGTTAATTAATTAATTAA-----
subject: ---------GCCTTAGGTTAATTAATTAATTAAGGGGG
score: -52.43786
Increasing the gap penalty solves this specific issue. But when running 1000s of alignment one cannot check that for every single alignment.
Biostrings::pairwiseAlignment("GTATATATAGCCTTAGGTTAATTAATTAATTAA",
"GCCTTAGGTTAATTAATTAATTAAGGGGG", type = "global", gapOpening = 50)
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: GTATATATAGCCTTAGGTTAATTAATTAATTAA-----
subject: G---------CCTTAGGTTAATTAATTAATTAAGGGGG
score: -108.4379
When further increasing the penalty the initial results re-appears. So, setting an appropriate gapOpening penalty is dependent on the sequences and the extend they overlap.
So, I think disabling penalties for terminal gaps may solve the issue but this may currently not be possible in Biostrings??
I think such global alignment without terminal gap penalties may be called semi-global alignment or fitting alignment or so.
Using type = "local" is not an option for me as I want to have the terminal gaps. But not like in the first and third case which somehow does not appear to be optimal according to human eye.
Thank you.