Question

Calculate the number of SNP differences between sequences in multiple alignment

0

Entering edit mode

lordbleys • 0

@lordbleys-7022

Last seen 8.1 years ago

European Union

Hi :)

This seems like a simple enough question but I can't find a straight answer...

I have a fasta alignment of 65 sequences, all of them exactly the same length (roughly 40 000 bp long).

The fasta alignment looks like:
>seq1
AAATTTCCCGGG
>seq2
CAATTTCCCGGG
>seq3
CAAGTTCCCGGG

The output I am looking for is the exact number of dissimilarities between each pair of sequence (Hamming distance):

seq1 seq2 seq3
seq1 0
seq2 1 0
seq3 2 1 0
etc...

I have tried importing the alignment with seqinr and using the function stringDist but it doesn't work:

snpcorealn <- read.alignment(file = "core_SNPs_matrix.fasta", format= 'fasta')
stringDist(snpcorealn, method = "hamming")

Error in .Call2("new_XStringSet_from_CHARACTER", ans_class, ans_elementType, :
key 54 (char '6') not in lookup table

I have tried importing the alignment with Biostrings, but it doesn't work either:

snpcorealn2 <- readDNAMultipleAlignment(filepath = system.file("core_SNPs_matrix.fasta", package="Biostrings"), format = "fasta")

Error in XStringSet("DNA", x, start = start, end = end, width = width, :
erreur d'évaluation de l'argument 'x' lors de la sélection d'une méthode pour la fonction 'XStringSet' : Error in .Call2("new_input_filexp", filepath, PACKAGE = "XVector") :
cannot open file ''

I am running out of ideas... If anyone can point me in the right direction, I don't even know why these methods are not working.

pairwisealignment SNP dnasequence • 4.1k views

ADD COMMENT • link 8.1 years ago lordbleys • 0

score 1 · Answer 1 · 2017-02-15

1

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 7 weeks ago

United States

Looks like you're mixing packages -- read.alignment() is from seqinr (not a Bioconductor package...) while stringDist() is from Biostrings. So sticking with Biostrings

library(Biostrings)
dna = readDNAStringSet("core_NSPs_matrix.fasta")
stringDist(dna, method="hamming")

ADD COMMENT • link 8.1 years ago Martin Morgan 25k

score 0 · Answer 2 · 2017-02-15

0

Entering edit mode

lordbleys • 0

@lordbleys-7022

Last seen 8.1 years ago

European Union

Thank you, that worked perfectly !

ADD COMMENT • link 8.1 years ago lordbleys • 0