Question: Counting SNPs where ambiguity codes represent heterozygosity.
0
4.3 years ago by
ben.ward30
United Kingdom
ben.ward30 wrote:

Hi,

If I have two DNAStrings in a DNAStringSet, they are of the same length:

> dna <- DNAStringSet(c("ATYGRTCGATCG", "MTSGATCRATCG"))
> dna
A DNAStringSet instance of length 2
width seq
[1]    12 ATYGRTCGATCG
[2]    12 MTSGATCRATCG

How can I count the number of mismatches between the two sequences, ideally at each base, assuming that the ambiguity codes denote heterozygosity. So for example at the first base, A means homozygous A/A, and M means heterozygous A/C, and therefore counts as 1 mismatch. Another example, at the third position, Y = heterozygous C/T, whereas S = heterozygous C/G, and again is one mismatch.

I've thought of a few ways of doing this using consensusMatrices and such and thought about defining some custom scoring matrix which when you index using the bases, spits out the appropriate score, so mat["A", "A"] = 0, mat["A", "M"] = 1 and so on.

I wanted to ask the board if this (or a similar task) is a task that has been done with Biocondcuctor before and what the best way of doing it might be.

Many thanks,

Ben W.

alignment biostrings snps • 665 views
modified 4.3 years ago by Hervé Pagès ♦♦ 14k • written 4.3 years ago by ben.ward30
Answer: Counting SNPs where ambiguity codes represent heterozygosity.
0
4.3 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi Ben,

Maybe you want to use neditAt() for that:

> neditAt(dna[[1]], dna[[2]])
[1] 4

See ?neditAt for more information. Of particular interest is the fixed argument to let you control how the IUPAC ambiguity codes should be interpreted.

Cheers,

H.