Some background in case I have got the wrong idea here. I have 240 amino acid sequences and want to see how closely related these sequences are (as well as visualising it), and how similar they are between different strains of bacteria. I have a data frame with the sequences as well as a column indicating to which strain each sequence belongs.
I have calculated a distance matrix after aligning the amino acid sequences in DECIPHER using the following code:
aa_aligned <- AlignSeqs(NZ_50_mature_D0.4_first50aa)
d <- DistanceMatrix(aa_aligned)
I then clustered them based on this matrix using IdCluster:
c <- IdClusters(d, show=TRUE, verbose = T)
This gives a tree, but with 240 sequences it is hard to read the names of the id's to then relate back to which strain they belong.
I would like to be able to colour the tree based on the "strain" column in my data frame. This way it will be easier visually to see if the sequences form the same strains cluster together. Is this possible?
Note: there are 50 "strains" so a key would be useful too.
If there is a better way t achieve what I am trying to do then any advise is welcome, I am still new to this kind of analysis in R,
Thank you.
Very useful,
The problem is my original data frame is something like this (with about 200 sequences:
So I then did the following:
### define where in the table the sequnences are (full_length_emm)
seq <- NZ_50_fullseq$full_length_emm
### and define that the names of the sequences are in the table "id" column
names(seq) <- NZ_50_fullseq$id
### tell biostrings that "seq" is a DNA string
NZ_50_dna <- DNAStringSet(seq)
### translate the DNA to amino acids
NZ_50_aa <- aa <- translate(NZ_50_dna)
### Select first 50 amino acids
NZ_50_aa <- subseq(NZ_50_aa, 1, 50)
Thank you again, and sorry if these are somehow missing the point.
If I understand correctly, bullet point #1 sounds like an indexing problem. You could simply lookup the strain name in a vector of colors named by strain.
For bullet point #2, you could subset the sequences before alignment:
Yes subsetting seems to be the easiest way to do it,
Is there a way to save the image opened in the web browser when using: BrowseSeqs(), rather than doing each strain iteratively and browsing one by one?
The BrowseSeqs() function opens a webpage and not an image. You can specify the htmlFile argument to set the filepath.