Some background in case I have got the wrong idea here. I have 240 amino acid sequences and want to see how closely related these sequences are (as well as visualising it), and how similar they are between different strains of bacteria. I have a data frame with the sequences as well as a column indicating to which strain each sequence belongs.
I have calculated a distance matrix after aligning the amino acid sequences in DECIPHER using the following code:
aa_aligned <- AlignSeqs(NZ_50_mature_D0.4_first50aa)
d <- DistanceMatrix(aa_aligned)
I then clustered them based on this matrix using IdCluster:
c <- IdClusters(d, show=TRUE, verbose = T)
This gives a tree, but with 240 sequences it is hard to read the names of the id's to then relate back to which strain they belong.
I would like to be able to colour the tree based on the "strain" column in my data frame. This way it will be easier visually to see if the sequences form the same strains cluster together. Is this possible?
Note: there are 50 "strains" so a key would be useful too.
If there is a better way t achieve what I am trying to do then any advise is welcome, I am still new to this kind of analysis in R,