Question: msa: How to obtain a subset of an alignment
13 months ago by
TU München
Christof Winter10 wrote:

I am using the msa package to align DNA sequences with Muscle (which works great!). Now I was wondering whether it's possible to extract a subset of an alignment. In the following example, I would like to extract just the first 3 rows from the alignment:

library(msa)

mySequenceFile <- system.file("examples", "exampleAA.fasta", package="msa")
mySequences

aln <- msa(mySequences)

# subset, get first 3 only

print(aln, show="complete")


However, the masked rows are still present and are showing up with # characters. How can I drop the masked parts in order to have just the first 3 rows in an alignment object?

modified 13 months ago by UBodenhofer250 • written 13 months ago by Christof Winter10
13 months ago by
UBodenhofer250
Johannes Kepler University, Linz, Austria
UBodenhofer250 wrote:

Thanks for your positive feedback, Christof!

Regarding your question: yes, it is true that objects of class 'MultipleAlignment' and classes derived from 'MultipleAlignment' do not support subsetting. Presently, I can offer the following workaround (... continuing your example code):

alnSubset <- as(AAMultipleAlignment(unmasked(aln)[1:3]),
"MsaAAMultipleAlignment")

print(alnSubset, show="complete")

I admit that this is not very elegant. Moreover, all metadata describing the alignment is lost. I am actually considering adding some more casts to the package or maybe even subsetting methods. Maybe somebody else has some thoughts on this subject?