Question

Can any of the existing `MSA` packages be used/adapted to visualize a pre-aligned fasta or tsv file (e.g., `kalign` output)?

0

Entering edit mode

jananiravi • 0

@janani-20663

Last seen 13 months ago

Denver, CO

I have a tab-delimited file (instead of a standard fasta input), which is a pre-aligned MSA (the first column is the accession number/leaf name for the alignment/tree, and the second column is the alignment, e.g., the output of Kalign). I would like to use one of the existing functions or tweak an existing function within msa to visualize my existing alignments using msaPrettyPrint (or something similar) instead of generating new ones each time with the fasta file & msa.

In summary, please let me know if any of you have suggestions for: 1) displaying pre-aligned sequences in any compatible format 2) is there a way to read a tsv and specify the leaf vs. the alignment. I could convert the files to fasta, but this might be a helpful feature. 3) using kalign as one of the alignment algorithms since this works better to align several 100 protein sequences better than clustal/muscle.

Thank you!

PS. @ulrich.bodenhofer thanks for the great package to generate MSA+tree!

msa phylogeny • 1.0k views

ADD COMMENT • link updated 4.9 years ago by ulrich.bodenhofer ▴ 20 • written 5.0 years ago by jananiravi • 0

score 0 · Answer 1 · 2019-05-13

It is quite straightforward to read a file with a pre-computed alignment. Suppose the file 'myAln.txt' is tab-separated and contains sequence names in the first and aligned sequences in the second column. Then the following code should do the job:

library(msa)

rawData <- read.table('myAln.txt', header=FALSE, stringsAsFactors=FALSE)

cVec <- toupper(rawData[[2]]) ## use toupper() if you have any lowercase characters
names(cVec) <- rawData[[1]]

aln <- AAMultipleAlignment(cVec)

You can then use msaPrettyPrint() on the 'aln' object. If the alignment is too large in terms of sequences or width, it might be necessary to split it into multiple pieces to get a good result. See Subsection 7.7 of the package vignette for some hints.