Biostrings readAAMultipleAlignment() does not handle valid Stockholm format files
0
0
Entering edit mode
david.huen • 0
@davidhuen-12002
Last seen 7.4 years ago

readAAMultipleAlignment does not correctly handle Stockholm-formatted MSAs generated by HMMer, generating the "alignment rows out of order" error.

On looking through the source, I think the problem arises from read.MultipleAlignment.splitRows() not correctly handling alignments where each row of the alignment can further be annotated by a comment line (e.g. "#=GR ...").  alnlines in the existing code (line 304 in MultipleALignment.R in github) expects blocks of all rows of the alignment to be on adjacent lines.  If they are interspersed with valid #=GR comments as in HMMER output, it incorrectly surmises the file consists of blocks of one row each and reports an error immediately.

A short term workaround is to strip out ^#=GR lines with sed or some similar tool before loading the sequence:-

Typical  example of alignment block that invokes error:-

SRR4381490.34595612_1_1/1-33         ---
#=GR SRR4381490.34595612_1_1/1-33 PP ...
SRR4381490.33050863_1_6/1-33         ---
#=GR SRR4381490.33050863_1_6/1-33 PP ...
SRR4381490.38714849_1_6/1-33         ---
#=GR SRR4381490.38714849_1_6/1-33 PP ...

Equivalent example which is accepted:-

SRR4381490.34595612_1_1/1-33         ---
SRR4381490.33050863_1_6/1-33         ---
SRR4381490.38714849_1_6/1-33         ---

Regards,

David Huen

biostrings • 756 views
ADD COMMENT

Login before adding your answer.

Traffic: 1040 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6