Question: Counting SNPs where ambiguity codes represent heterozygosity.
0
gravatar for ben.ward
4.2 years ago by
ben.ward30
United Kingdom
ben.ward30 wrote:

Hi,

If I have two DNAStrings in a DNAStringSet, they are of the same length:

> dna <- DNAStringSet(c("ATYGRTCGATCG", "MTSGATCRATCG"))
> dna
  A DNAStringSet instance of length 2
    width seq
[1]    12 ATYGRTCGATCG
[2]    12 MTSGATCRATCG

 

How can I count the number of mismatches between the two sequences, ideally at each base, assuming that the ambiguity codes denote heterozygosity. So for example at the first base, A means homozygous A/A, and M means heterozygous A/C, and therefore counts as 1 mismatch. Another example, at the third position, Y = heterozygous C/T, whereas S = heterozygous C/G, and again is one mismatch.

I've thought of a few ways of doing this using consensusMatrices and such and thought about defining some custom scoring matrix which when you index using the bases, spits out the appropriate score, so mat["A", "A"] = 0, mat["A", "M"] = 1 and so on. 

I wanted to ask the board if this (or a similar task) is a task that has been done with Biocondcuctor before and what the best way of doing it might be.

Many thanks,

Ben W.

alignment biostrings snps • 642 views
ADD COMMENTlink modified 4.2 years ago by Hervé Pagès ♦♦ 14k • written 4.2 years ago by ben.ward30
Answer: Counting SNPs where ambiguity codes represent heterozygosity.
0
gravatar for Hervé Pagès
4.2 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi Ben,

Maybe you want to use neditAt() for that:

> neditAt(dna[[1]], dna[[2]])
[1] 4

See ?neditAt for more information. Of particular interest is the fixed argument to let you control how the IUPAC ambiguity codes should be interpreted.

Cheers,

H.

 

ADD COMMENTlink written 4.2 years ago by Hervé Pagès ♦♦ 14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 95 users visited in the last hour