I am wondering whether there is a straightforward method for updating a training set created by DECIPHER::LearnTaxa
. I have new sequences that I'd like to include in the training set but as far as I can tell there isn't a function to add sequences to an existing training set. I'm also unable to determine whether the trainingSet
object contains the original taxonomic lineages and sequences that could be written and used to reproduce the database.
If I cannot reuse the training set at all I'd appreciate learning how to reproduce these training sets, including follow any preprocessing steps (e.g. homology, length) that are normally taken to ensure high-quality classifications. I'm specifically interested in building upon the GTDB r207 database and would like to know what the input files were to generate the training set hosted at http://www2.decipher.codes/Downloads.html. Was it the file ssu_all_r207.tar.gz?
Many thanks for your time!
Connor