Search
Question: running msa (msa) aborts R session
0
gravatar for ans74
7 days ago by
ans740
ans740 wrote:

Hello there,

I'm trying to run msa (msa package, version 1.10.0) in R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree" with ~71,000 260-400bp sequences.

However, R session aborts everytime I run the following:

alignment <- msa(dna, method="ClustalW")

No extra info is given, since a new R session is started.

dna looks normal by the way:


> dna
  A DNAStringSet instance of length 70937
        width seq                                                                        names               
    [1]   402 TGGGGAATATTACACAATGGAGGAAACTCTGATGTA...CTGACGCTCAGATGCGAAAGCGTGGGTAGCAAACA SV_1
    [2]   427 TGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCA...CTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_2
    [3]   402 TGAGGAATATTGCACAATGGAGGAAACTCTGATGCA...CTGACGCTGAGGCACGAAAGCGTGGGGAGCAAACA SV_3
    [4]   427 TGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCA...CTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_4
    [5]   427 TGGGGAATTTTGGACAATGGACGAAAGTCTGATCCA...CTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_5
    ...   ... ...
[70933]   428 TGGGGAATATTGCGCAATGGCCGAAAGGCTGACGCA...CTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_70933
[70934]   428 TGGGGAATATTGCGCAATGGCCGAAAGGCTGACGCA...CTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_70934
[70935]   403 ACGAGAATATTCGACAATGCACGAAAGTGTGATCGA...CTGACGGTCAATCACTAAAGCGTGGGGATCAAAAA SV_70935
[70936]   402 TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCA...TTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_70936
[70937]   429 TGGGGAATTTTGGACAATGGGCGAAAGCCTGACGCA...CTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA SV_70937

Any help will be appreciated!

Thanks,

André

ADD COMMENTlink modified 5 days ago by UBodenhofer230 • written 7 days ago by ans740
1
gravatar for UBodenhofer
7 days ago by
UBodenhofer230
Johannes Kepler University, Linz, Austria
UBodenhofer230 wrote:

I'm sorry you are encountering difficulties with our package! It is actually quite difficult to guess the source of the problem. Can you provide the sequences for debugging or are they confidential? In any case, there is one thing you can first try yourself: can you use a subset of your sequences and increase the number of sequences to find out from which size on the problem appears?

ADD COMMENTlink written 7 days ago by UBodenhofer230
1
gravatar for UBodenhofer
5 days ago by
UBodenhofer230
Johannes Kepler University, Linz, Austria
UBodenhofer230 wrote:

In this case, André, I agree that it is a memory issue. If you insist on ClustalW, you may have to resort to the command line version (though I am not convinced that this will work). Maybe you better give ClustalOmega a try, since it is explicitly designed for handling larger data. Sorry that I cannot say more by now.

ADD COMMENTlink written 5 days ago by UBodenhofer230
0
gravatar for ans74
5 days ago by
ans740
ans740 wrote:

@UBodenhofer, thanks for the reply!

The sequences are confidential indeed, sorry.

Tried sub-sampling my dataset to 500 and it ran perfectly in ~15min. However, after subsampling it to 5000, it's still running after 2h and took up to ~15GB RAM. i have much more than that available but this might be a memory issue I guess...

Cheers,
André

ADD COMMENTlink modified 5 days ago • written 5 days ago by ans740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 332 users visited in the last hour