I know this may seem like and obvious question, but I have read the documentation and am finding it hard to destignguish all the different file types etc. I am new to using R for genomics purposes so even an answer potinting me in the right direction of which doumentation to read or tutorials in terms of how to best carry out the follwing in DECIPHER would be great.
I have been sent 2 .mfa files, (multiple fasta alignment), which are essentially multiple fasta files in one. They can even be opened in
|
||||
etc... 175 sequences in total | ||||
I simply want to import the fasta files in the .mfa file with the ">Mxx_150bp" as the names of all the sequences.
I then want to identify a specfic sequence (signal sequence, present in all the fasta files) and cut all the sequences from there on, tranlate the remaining sequences into amino acids and select the first 50 amino acids, and align these to see the similatiy between them. Can this all be done in DECIPHER?
Thanks Erik,
I will look up all the functions individually.
on (1) my probelm is there are 175 sequences in one .mfa file. What would be your advice on importing them all in one file?
(2) Sorry if I was not clear. I go the steps in slighlty the wrong order, I would like to translate all 175 sequences first and then search for a "LPXTG", where X can be any amino acid, in all 175 translated amino acid sequences.
Regarding (1), the above suggestion should work for multiple sequences per file.
Regarding (2), you can search for a subsequence with: