demultiplex function in the very nice
DNABarcodes package (Tilo Buschmann) returns a
data.frame with, per read, the original read (including the original barcode); the decyphered (i.e. error-corrected) barcode, and the total edit distance between original barcode and error-corrected barcoded. When using the 'Sequence-Levensthein' distance, the barcode that is in the original read may have a different length than the error-corrected barcode (since Sequence-Levensthein allows for correction of insertions and deletions up to a certain edit distance). Is there an easy way, other than doing a Smith-Waterman alignment between the two, to find out the length of the barcode (to know where the actual sequence begins without throwing away nucleotides) ?
Any help appreciated,