DECIPHER IdTaxa: Speed up training?
0
0
Entering edit mode
Till ▴ 30
@till-9696
Last seen 9 months ago
Germany

Hi!

I'm using IdTaxa to classify eDNA sequences against various databases. For large databases, for example MIDORI or BOLD, training takes very long, often several days. Is there any way to speed this up, short of reducing the iterations (I use 3)? I removed the gap deletion and orienting parts already as they are not necessary. Maybe some parts in LearnTaxa could be parallelized, or has the function been optimized as much as possible already?

Cheers, Till

DECIPHER idtaxa • 692 views
ADD COMMENT
0
Entering edit mode

Hi Till, Which part is taking a long time: the part before the progress bar or during the progress bar? -- Erik

ADD REPLY
0
Entering edit mode

Hi Erik,

I think both? With all of BOLD I think the part before the progress bar takes about one day (removing groups also about that long), and the progress bar about 4 days, but I didn't actually keep track. I guess it's just a lot of sequences...

Load taxid file...
[1] 509179
Removing groups...
[1] 7515497
Training iteration: 1
================================================================================

Time difference of 592074.3 secs
Training iteration: 2

This is on an older Intel Xeon E5-2620.

ADD REPLY
0
Entering edit mode

Hi Till, It looks like you are training on millions of sequences, if I am interpreting your output correctly. I would expect this to take a long time: both training and classifying. There was a change that increased training speed a few versions ago, but if you are using the latest version of DECIPHER then that is as fast as it will go. Setting maxChildren=0 will provide some speedup to training, but may slow down classification depending on the training set. Your best bet is to use Clusterize() to reduce redundant training sequences. There is negligible advantage to including very similar sequences with the same label when training. -- Erik

ADD REPLY

Login before adding your answer.

Traffic: 559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6