Can any of the existing MSA packages be used/adapted to visualize a pre-aligned fasta or tsv file (e.g., kalign output)?
1
0
Entering edit mode
janani • 0
@janani-20663
Last seen 11 months ago
East Lansing, MI

I have a tab-delimited file (instead of a standard fasta input), which is a pre-aligned MSA (the first column is the accession number/leaf name for the alignment/tree, and the second column is the alignment, e.g., the output of Kalign). I would like to use one of the existing functions or tweak an existing function within msa to visualize my existing alignments using msaPrettyPrint (or something similar) instead of generating new ones each time with the fasta file & msa.

In summary, please let me know if any of you have suggestions for: 1) displaying pre-aligned sequences in any compatible format 2) is there a way to read a tsv and specify the leaf vs. the alignment. I could convert the files to fasta, but this might be a helpful feature. 3) using kalign as one of the alignment algorithms since this works better to align several 100 protein sequences better than clustal/muscle.

Thank you!

PS. @ulrich.bodenhofer thanks for the great package to generate MSA+tree!

msa phylogeny • 346 views
0
Entering edit mode
@ulrichbodenhofer-8624
Last seen 2.8 years ago
Austria

It is quite straightforward to read a file with a pre-computed alignment. Suppose the file 'myAln.txt' is tab-separated and contains sequence names in the first and aligned sequences in the second column. Then the following code should do the job:

library(msa)

cVec <- toupper(rawData[[2]]) ## use toupper() if you have any lowercase characters
names(cVec) <- rawData[[1]]

aln <- AAMultipleAlignment(cVec)


You can then use msaPrettyPrint() on the 'aln' object. If the alignment is too large in terms of sequences or width, it might be necessary to split it into multiple pieces to get a good result. See Subsection 7.7 of the package vignette for some hints.