I've been using DNABarcodes to analyze a set of barcodes; however, I have been running to some unexpected behavior:
I have a set of 10 barcodes where the first 16 bases are identical but the final 6 bases are different. The final 6 have a minimum hamming distance of 3, and so I expected that the full length barcode would also have a minimum hamming distance of 3, and amenable for a 1 base error correction
When I use the analyze.barcodes function, I get a minimum hamming distance of 2. However, when I use the analyze.barcodes function on the reverse of each string, I get the expected minimum hamming distance of 3.
Why is the analyze.barcodes function behaving in this way? What does this mean for error correction?
short_barcodes <- c("TACAGC", "CCACTT", "TGAACG",
"GGAGAA", "CAGGAA", "ACCGAA",
"GTCCAA", "CCTCAA", "GCGTAA",
"TTCGGA")
DNABarcodes::analyse.barcodes(short_barcodes, metric = "hamming")
full_barcodes <- c("TAAGGCGAGCGATCTATACAGC", "TAAGGCGAGCGATCTACCACTT", "TAAGGCGAGCGATCTATGAACG",
"TAAGGCGAGCGATCTAGGAGAA", "TAAGGCGAGCGATCTACAGGAA", "TAAGGCGAGCGATCTAACCGAA",
"TAAGGCGAGCGATCTAGTCCAA", "TAAGGCGAGCGATCTACCTCAA", "TAAGGCGAGCGATCTAGCGTAA",
"TAAGGCGAGCGATCTATTCGGA")
DNABarcodes::analyse.barcodes(full_barcodes, metric = "hamming")
DNABarcodes::analyse.barcodes(stringi::stri_reverse(full_barcodes), metric = "hamming")