DNABarcodes - hamming distance dependent on sequence direction?

0

Entering edit mode

daniel.zhang • 0

@99b2227a

Last seen 3.3 years ago

United States

I've been using DNABarcodes to analyze a set of barcodes; however, I have been running to some unexpected behavior:

I have a set of 10 barcodes where the first 16 bases are identical but the final 6 bases are different. The final 6 have a minimum hamming distance of 3, and so I expected that the full length barcode would also have a minimum hamming distance of 3, and amenable for a 1 base error correction

When I use the analyze.barcodes function, I get a minimum hamming distance of 2. However, when I use the analyze.barcodes function on the reverse of each string, I get the expected minimum hamming distance of 3.

Why is the analyze.barcodes function behaving in this way? What does this mean for error correction?

short_barcodes <- c("TACAGC", "CCACTT", "TGAACG",
                    "GGAGAA", "CAGGAA", "ACCGAA",
                    "GTCCAA", "CCTCAA", "GCGTAA", 
                    "TTCGGA")
DNABarcodes::analyse.barcodes(short_barcodes, metric = "hamming")

full_barcodes <- c("TAAGGCGAGCGATCTATACAGC", "TAAGGCGAGCGATCTACCACTT", "TAAGGCGAGCGATCTATGAACG",
              "TAAGGCGAGCGATCTAGGAGAA", "TAAGGCGAGCGATCTACAGGAA", "TAAGGCGAGCGATCTAACCGAA",
              "TAAGGCGAGCGATCTAGTCCAA", "TAAGGCGAGCGATCTACCTCAA", "TAAGGCGAGCGATCTAGCGTAA",
              "TAAGGCGAGCGATCTATTCGGA")
DNABarcodes::analyse.barcodes(full_barcodes, metric = "hamming")

DNABarcodes::analyse.barcodes(stringi::stri_reverse(full_barcodes), metric = "hamming")

DNABarcodes • 793 views

ADD COMMENT • link 3.3 years ago daniel.zhang • 0

Login before adding your answer.

Similar Posts

Calculate the number of SNP differences between sequences in multiple alignment •

8.1 years ago lordbleys • 0

Hi :) This seems like a simple enough question but I can't find a straight answer... I have a fasta alignment of 65 sequences, all of the…

Sequence Distance matrix with large sequences •

updated 11.4 years ago by Hervé Pagès 16k • written 11.4 years ago by Benjamin Ward ENV ▴ 20

<div class="preformatted">Hi, I've been using the DNAbin class and the dist.dna() function in a package I've been making to get a matrix o…

stringDist ? •

12.0 years ago Hervé Pagès 16k

<div class="preformatted">Hi Scott, On 04/08/2013 11:04 AM, Scott Schwartz wrote: > Hi Herve -- You might not be who I need to ask abou…

DNABarcodes-Installation Issue •

14 months ago nk130 • 0

I am running into issues setting up DNABarcodes. Is this because the function `std_random_shuffle()` has been deprecated for C++? Should I …

Bioconductor utility with novel projects •

updated 10.5 years ago by Hervé Pagès 16k • written 10.5 years ago by topher.hamm • 0

Dear Bioc community, I'm wondering if I am missing something. I have been using Bioconductor in an attempt to visualize the location of tb…

DNABarcodes: issue with wrong barcode decoding •

updated 14 months ago by nk130 • 0 • written 5.9 years ago by Philip Lijnzaad ▴ 160

I'm a happy user of the DNABarcodes package by Tilo Buschmann. I came across the paper by Hawkins, John A., Stephen K. Jones, Ilya J. Finke…

Running DNAbarcodes with 12bp •

updated 14 months ago by nk130 • 0 • written 4.8 years ago by roy.granit • 0

Have anyone been able to run the DNAbarcodes with metric="seqlev", heuristic="ashlock" ? any estimate how long it might take? It's running …

Rsubread align() output_file, dir not created first •

updated 9.6 years ago by Wei Shi ★ 3.6k • written 9.6 years ago by Michael Love 43k

hi, I noticed, when I try to `` align() `` and give an output file path which includes a non-existent directory, I get a segfault instead …

GRanges: problem with multiple seqnames and seqlengths •

10.5 years ago topher.hamm • 0

I would like to take the results of a BLAST search and look at their location on a de novo genome. I'm running into an issue creating the G…

DNABarcodes: finding length of barcode •

5.9 years ago Philip Lijnzaad ▴ 160

Dear all, the `demultiplex` function in the very nice `DNABarcodes` package (Tilo Buschmann) returns a `data.frame` with, per read, the …

List of Deprecated Packages for Bioc3.19 •

13 months ago • updated 5 months ago shepherl 4.1k

The Bioconductor Team is continuing to identify packages that will be deprecated in the next release to allow for the Bioconductor communit…

Possible bug in Rsubread Phred-Score read •

updated 10.4 years ago by Wei Shi ★ 3.6k • written 10.4 years ago by caufeminecraft ▴ 20

I have been using Rsubread to perform alignment for a large number of paired-end gzFASTQ files using the align() function, and have been ca…

Bioconductor 2.13 is released •

11.5 years ago Dan Tenenbaum ★ 8.2k

<div class="preformatted">Bioconductors: We are pleased to announce Bioconductor 2.13, consisting of 749 software packages, 179 experiment…

Loading Similar Posts

Traffic: 930 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6