Question: Is it possible to import a multiple fasta alignemnt (.mfa) file into DECIPHER?
gravatar for reubenmcgregor88
17 months ago by
reubenmcgregor880 wrote:

I know this may seem like and obvious question, but I have read the documentation and am finding it hard to destignguish all the different file types etc. I am new to using R for genomics purposes so even an answer potinting me in the right direction of which doumentation to read or tutorials in terms of how to best carry out the follwing in DECIPHER would be great.

I have been sent 2 .mfa files, (multiple fasta alignment), which are essentially multiple fasta files in one. They can even be opened in

etc... 175 sequences in total

I simply want to import the fasta files in the .mfa file with the ">Mxx_150bp" as the names of all the sequences.

I then want to identify a specfic sequence (signal sequence, present in all the fasta files) and cut all the sequences from there on, tranlate the remaining sequences into amino acids and select the first 50 amino acids, and align these to see the similatiy between them. Can this all be done in DECIPHER?


biostrings decipher dna • 455 views
ADD COMMENTlink modified 17 months ago by Erik Wright130 • written 17 months ago by reubenmcgregor880
Answer: Is it possible to import a multiple fasta alignemnt (.mfa) file into DECIPHER?
gravatar for Erik Wright
17 months ago by
Erik Wright130
Erik Wright130 wrote:

If I understand correctly, you want to:

(1) Read in the sequences in a FASTA file:

dna <- readDNAStringSet("<<path to file.mfa>>")

(2)  Identify a specific sequence present in all files -- I don't understand exactly what you mean here.  Perhaps:

w <- which(dna==DNAString("GTCC..."))

(3)  Cut all of the sequences from there on:

dna <- dna[1:(w[1] - 1)]

(4)  Translate the sequences into amino acids:

aa <- translate(dna)

(5)  Select the first 50 amino acids:

aa <- subseq(aa, 1, 50)

(6)  Align these sequences:

aa_aligned <- AlignSeqs(aa)

(7)  See the similarity between them -- not sure what you mean here.  Perhaps:



d <- DistanceMatrix(aa_aligned)

c <- IdClusters(d, show=TRUE)

I hope that helps!  And, yes, all of this can be done with DECIPHER.

- Erik

ADD COMMENTlink written 17 months ago by Erik Wright130

Thanks Erik,

I will look up all the functions individually.

on (1) my probelm is there are 175 sequences in one .mfa file. What would be your advice on importing them all in one file?

(2) Sorry if I was not clear. I go the steps in slighlty the wrong order, I would like to translate all 175 sequences first and then  search for a "LPXTG", where X can be any amino acid, in all 175 translated amino acid sequences.

ADD REPLYlink written 17 months ago by reubenmcgregor880

Regarding (1), the above suggestion should work for multiple sequences per file.

Regarding (2), you can search for a subsequence with:

v <- vcountPattern("LPZXTG", aa, fixed="subject")

w <- which(v > 0) # sequences containing the pattern
ADD REPLYlink written 17 months ago by Erik Wright130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 304 users visited in the last hour