I have trimmed and translated my NGS data and now have all the amino acid sequences in file: trans_seqs1 (Large AAStringSet, 14.7 Mb).
These have been quality controlled down from the original 600000 sequences.
Info;
length(trans_seqs1)
[1] 482694
summary(nchar(trans_seqs1))
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.00 20.00 20.00 19.97 20.00 39.00
When I try aligning all the sequences (widths between 18-20) I get this error;
library(DECIPHER)
seqsalign <- AlignSeqs(trans_seqs1)
Determining distance matrix based on shared 3-mers:
| | 0%
Error: protect(): protection stack overflow
What can I do? Is it the function or my computer (Windows 7, 64bit operating system, 8GB RAM)?
I have also tried using the msaClustalW function but Clustal does not seem to align more than 500 sequences at a time. Can anyone suggest a package to align such a large amount of sequences in one go?
Hello,
Using the defaults, DECIPHER cannot align more than ~46k sequences. I do not know of any alignment programs that would be able to accurately align 482k sequences.
If you are looking to map reads, for example after RNA-seq, then this is a different type of "alignment". There are many read mapping programs available, but DECIPHER cannot perform read mapping.
I hope that helps!
Erik