Hi :)
This seems like a simple enough question but I can't find a straight answer...
I have a fasta alignment of 65 sequences, all of them exactly the same length (roughly 40 000 bp long).
The fasta alignment looks like:
>seq1
AAATTTCCCGGG
>seq2
CAATTTCCCGGG
>seq3
CAAGTTCCCGGG
The output I am looking for is the exact number of dissimilarities between each pair of sequence (Hamming distance):
seq1 seq2 seq3
seq1 0
seq2 1 0
seq3 2 1 0
etc...
I have tried importing the alignment with seqinr and using the function stringDist but it doesn't work:
snpcorealn <- read.alignment(file = "core_SNPs_matrix.fasta", format= 'fasta') stringDist(snpcorealn, method = "hamming")
Error in .Call2("new_XStringSet_from_CHARACTER", ans_class, ans_elementType, :
key 54 (char '6') not in lookup table
I have tried importing the alignment with Biostrings, but it doesn't work either:
snpcorealn2 <- readDNAMultipleAlignment(filepath = system.file("core_SNPs_matrix.fasta", package="Biostrings"), format = "fasta")
Error in XStringSet("DNA", x, start = start, end = end, width = width, :
erreur d'évaluation de l'argument 'x' lors de la sélection d'une méthode pour la fonction 'XStringSet' : Error in .Call2("new_input_filexp", filepath, PACKAGE = "XVector") :
cannot open file ''
I am running out of ideas... If anyone can point me in the right direction, I don't even know why these methods are not working.