Entering edit mode
Benjamin Ward ENV
▴
20
@benjamin-ward-env-6205
Last seen 10.6 years ago
Hi,
I've been using the DNAbin class and the dist.dna() function in a
package I've been making to get a matrix of hamming distances between
DNA sequences in a multiple sequence alignment. I've done this with
sequences hundreds of thousands long but want to allow the capability
to use sequences from genome data i.e. Mbp long. I know there is a
Biostring package in the Bioconductor project that is supposed to
store very big sequences effectively. Can I do an equivalent job with
Bio-strings yielding me such distance information, and can I also
identify all the SNPs in an alignment with these large sequences i.e.
the segregating sites? If so how?
Many Thanks,
Ben.
[[alternative HTML version deleted]]