Counting SNPs where ambiguity codes represent heterozygosity.
1
0
Entering edit mode
ben.ward ▴ 30
@benward-7169
Last seen 8.1 years ago
United Kingdom

Hi,

If I have two DNAStrings in a DNAStringSet, they are of the same length:

> dna <- DNAStringSet(c("ATYGRTCGATCG", "MTSGATCRATCG"))
> dna
  A DNAStringSet instance of length 2
    width seq
[1]    12 ATYGRTCGATCG
[2]    12 MTSGATCRATCG

 

How can I count the number of mismatches between the two sequences, ideally at each base, assuming that the ambiguity codes denote heterozygosity. So for example at the first base, A means homozygous A/A, and M means heterozygous A/C, and therefore counts as 1 mismatch. Another example, at the third position, Y = heterozygous C/T, whereas S = heterozygous C/G, and again is one mismatch.

I've thought of a few ways of doing this using consensusMatrices and such and thought about defining some custom scoring matrix which when you index using the bases, spits out the appropriate score, so mat["A", "A"] = 0, mat["A", "M"] = 1 and so on. 

I wanted to ask the board if this (or a similar task) is a task that has been done with Biocondcuctor before and what the best way of doing it might be.

Many thanks,

Ben W.

Biostrings SNPs alignment • 1.5k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 15 hours ago
Seattle, WA, United States

Hi Ben,

Maybe you want to use neditAt() for that:

> neditAt(dna[[1]], dna[[2]])
[1] 4

See ?neditAt for more information. Of particular interest is the fixed argument to let you control how the IUPAC ambiguity codes should be interpreted.

Cheers,

H.

 

ADD COMMENT

Login before adding your answer.

Traffic: 837 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6