DNABarcodes - hamming distance dependent on sequence direction?
0
0
Entering edit mode
@99b2227a
Last seen 2.4 years ago
United States

I've been using DNABarcodes to analyze a set of barcodes; however, I have been running to some unexpected behavior:

I have a set of 10 barcodes where the first 16 bases are identical but the final 6 bases are different. The final 6 have a minimum hamming distance of 3, and so I expected that the full length barcode would also have a minimum hamming distance of 3, and amenable for a 1 base error correction

When I use the analyze.barcodes function, I get a minimum hamming distance of 2. However, when I use the analyze.barcodes function on the reverse of each string, I get the expected minimum hamming distance of 3.

Why is the analyze.barcodes function behaving in this way? What does this mean for error correction?

short_barcodes <- c("TACAGC", "CCACTT", "TGAACG",
                    "GGAGAA", "CAGGAA", "ACCGAA",
                    "GTCCAA", "CCTCAA", "GCGTAA", 
                    "TTCGGA")
DNABarcodes::analyse.barcodes(short_barcodes, metric = "hamming")

full_barcodes <- c("TAAGGCGAGCGATCTATACAGC", "TAAGGCGAGCGATCTACCACTT", "TAAGGCGAGCGATCTATGAACG",
              "TAAGGCGAGCGATCTAGGAGAA", "TAAGGCGAGCGATCTACAGGAA", "TAAGGCGAGCGATCTAACCGAA",
              "TAAGGCGAGCGATCTAGTCCAA", "TAAGGCGAGCGATCTACCTCAA", "TAAGGCGAGCGATCTAGCGTAA",
              "TAAGGCGAGCGATCTATTCGGA")
DNABarcodes::analyse.barcodes(full_barcodes, metric = "hamming")

DNABarcodes::analyse.barcodes(stringi::stri_reverse(full_barcodes), metric = "hamming")
DNABarcodes • 611 views
ADD COMMENT

Login before adding your answer.

Traffic: 616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6