msa: How to obtain a subset of an alignment
1
1
Entering edit mode
@christof-winter-13999
Last seen 4.9 years ago
TU München

I am using the msa package to align DNA sequences with Muscle (which works great!). Now I was wondering whether it's possible to extract a subset of an alignment. In the following example, I would like to extract just the first 3 rows from the alignment:

library(msa)

mySequenceFile <- system.file("examples", "exampleAA.fasta", package="msa")
mySequences

aln <- msa(mySequences)

# subset, get first 3 only

print(aln, show="complete")


However, the masked rows are still present and are showing up with # characters. How can I drop the masked parts in order to have just the first 3 rows in an alignment object?

msa Biostrings MultipleAlignment • 1.3k views
1
Entering edit mode
UBodenhofer ▴ 290
@ubodenhofer-5425
Last seen 19 days ago
University of Applied Sciences Upper Au…

Thanks for your positive feedback, Christof!

Regarding your question: yes, it is true that objects of class 'MultipleAlignment' and classes derived from 'MultipleAlignment' do not support subsetting. Presently, I can offer the following workaround (... continuing your example code):

alnSubset <- as(AAMultipleAlignment(unmasked(aln)[1:3]),
"MsaAAMultipleAlignment")

print(alnSubset, show="complete")

I admit that this is not very elegant. Moreover, all metadata describing the alignment is lost. I am actually considering adding some more casts to the package or maybe even subsetting methods. Maybe somebody else has some thoughts on this subject?

0
Entering edit mode

Please, please add some subsetting methods. You have the easiest and most flexible of the BioConductor alignment systems. Just needs the ability to get inside to do analyses we need to do, not just the standard ones. Thanks.