running msa (msa) aborts R session
3
0
Entering edit mode
ans74 • 0
@ans74-14558
Last seen 5.0 years ago

Hello there,

I'm trying to run msa (msa package, version 1.10.0) in R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree" with ~71,000 260-400bp sequences.

However, R session aborts everytime I run the following:

alignment <- msa(dna, method="ClustalW")

No extra info is given, since a new R session is started.

dna looks normal by the way:


> dna
A DNAStringSet instance of length 70937
width seq                                                                        names
[1]   402 TGGGGAATATTACACAATGGAGGAAACTCTGATGTA...CTGACGCTCAGATGCGAAAGCGTGGGTAGCAAACA SV_1
[2]   427 TGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCA...CTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_2
[3]   402 TGAGGAATATTGCACAATGGAGGAAACTCTGATGCA...CTGACGCTGAGGCACGAAAGCGTGGGGAGCAAACA SV_3
[4]   427 TGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCA...CTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_4
[5]   427 TGGGGAATTTTGGACAATGGACGAAAGTCTGATCCA...CTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_5
...   ... ...
[70933]   428 TGGGGAATATTGCGCAATGGCCGAAAGGCTGACGCA...CTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_70933
[70934]   428 TGGGGAATATTGCGCAATGGCCGAAAGGCTGACGCA...CTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_70934
[70935]   403 ACGAGAATATTCGACAATGCACGAAAGTGTGATCGA...CTGACGGTCAATCACTAAAGCGTGGGGATCAAAAA SV_70935
[70936]   402 TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCA...TTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_70936
[70937]   429 TGGGGAATTTTGGACAATGGGCGAAAGCCTGACGCA...CTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_70937

Any help will be appreciated!

Thanks,

André

msa alignment r software error • 1.2k views
1
Entering edit mode
UBodenhofer ▴ 290
@ubodenhofer-5425
Last seen 4 months ago
University of Applied Sciences Upper Au…

I'm sorry you are encountering difficulties with our package! It is actually quite difficult to guess the source of the problem. Can you provide the sequences for debugging or are they confidential? In any case, there is one thing you can first try yourself: can you use a subset of your sequences and increase the number of sequences to find out from which size on the problem appears?

1
Entering edit mode
UBodenhofer ▴ 290
@ubodenhofer-5425
Last seen 4 months ago
University of Applied Sciences Upper Au…

In this case, André, I agree that it is a memory issue. If you insist on ClustalW, you may have to resort to the command line version (though I am not convinced that this will work). Maybe you better give ClustalOmega a try, since it is explicitly designed for handling larger data. Sorry that I cannot say more by now.

0
Entering edit mode
ans74 • 0
@ans74-14558
Last seen 5.0 years ago

The sequences are confidential indeed, sorry.

Tried sub-sampling my dataset to 500 and it ran perfectly in ~15min. However, after subsampling it to 5000, it's still running after 2h and took up to ~15GB RAM. i have much more than that available but this might be a memory issue I guess...

Cheers,
André