Is it possible to import a multiple fasta alignemnt (.mfa) file into DECIPHER?
1
0
Entering edit mode
@reubenmcgregor88-13722
Last seen 6 months ago

I know this may seem like and obvious question, but I have read the documentation and am finding it hard to destignguish all the different file types etc. I am new to using R for genomics purposes so even an answer potinting me in the right direction of which doumentation to read or tutorials in terms of how to best carry out the follwing in DECIPHER would be great.

I have been sent 2 .mfa files, (multiple fasta alignment), which are essentially multiple fasta files in one. They can even be opened in

 >M1_150bp GGCTTTGCAAACCAAACAGAAGTTAAGGCTGATTCTTCCAGAGAAGTAACCAACGAATTTACTGCTTCAATGTGGAAAGCACAGGCGGATAGTGCAAAGGCTAAAGCGAAGGAACTAGAAAAACAAGTTGAGGAATATAAAAAAAATTAT >M2_150bp GGCTTTGCAAACCAAACAGAAGTTAAGGCGGAAGGGGTTTCTGTAGGTTCAGATGCATCACTACATAACCGCATTACAGACCTTGAAGAGGAAAGAGAAAAATTATTAAATAAATTAGATAAAGTTGAAGAAGAGCATAAAAAAGATCAT
etc... 175 sequences in total

I simply want to import the fasta files in the .mfa file with the ">Mxx_150bp" as the names of all the sequences.

I then want to identify a specfic sequence (signal sequence, present in all the fasta files) and cut all the sequences from there on, tranlate the remaining sequences into amino acids and select the first 50 amino acids, and align these to see the similatiy between them. Can this all be done in DECIPHER?

decipher DNA biostrings • 757 views
2
Entering edit mode
Erik Wright ▴ 150
@erik-wright-14386
Last seen 6 weeks ago
United States

If I understand correctly, you want to:

(1) Read in the sequences in a FASTA file:

dna <- readDNAStringSet("<<path to file.mfa>>")

(2)  Identify a specific sequence present in all files -- I don't understand exactly what you mean here.  Perhaps:

w <- which(dna==DNAString("GTCC..."))

(3)  Cut all of the sequences from there on:

dna <- dna[1:(w[1] - 1)]

(4)  Translate the sequences into amino acids:

aa <- translate(dna)

(5)  Select the first 50 amino acids:

aa <- subseq(aa, 1, 50)

(6)  Align these sequences:

aa_aligned <- AlignSeqs(aa)

(7)  See the similarity between them -- not sure what you mean here.  Perhaps:

BrowseSeqs(aa_aligned)

Or:

d <- DistanceMatrix(aa_aligned)

c <- IdClusters(d, show=TRUE)

I hope that helps!  And, yes, all of this can be done with DECIPHER.

- Erik

0
Entering edit mode

Thanks Erik,

I will look up all the functions individually.

on (1) my probelm is there are 175 sequences in one .mfa file. What would be your advice on importing them all in one file?

(2) Sorry if I was not clear. I go the steps in slighlty the wrong order, I would like to translate all 175 sequences first and then  search for a "LPXTG", where X can be any amino acid, in all 175 translated amino acid sequences.

0
Entering edit mode

Regarding (1), the above suggestion should work for multiple sequences per file.

Regarding (2), you can search for a subsequence with:

v <- vcountPattern("LPZXTG", aa, fixed="subject")

w <- which(v > 0) # sequences containing the pattern