Question

Is it possible to import a multiple fasta alignemnt (.mfa) file into DECIPHER?

0

Entering edit mode

reubenmcgregor88 • 0

@reubenmcgregor88-13722

Last seen 5.1 years ago

I know this may seem like and obvious question, but I have read the documentation and am finding it hard to destignguish all the different file types etc. I am new to using R for genomics purposes so even an answer potinting me in the right direction of which doumentation to read or tutorials in terms of how to best carry out the follwing in DECIPHER would be great.

I have been sent 2 .mfa files, (multiple fasta alignment), which are essentially multiple fasta files in one. They can even be opened in

>M1_150bp

GGCTTTGCAAACCAAACAGAAGTTAAGGCTGATTCTTCCAGAGAAGTAACCAACGAATTTACTGCTTCAATGTGGAAAGCACAGGCGGATAGTGCAAAGGCTAAAGCGAAGGAACTAGAAAAACAAGTTGAGGAATATAAAAAAAATTAT

>M2_150bp

GGCTTTGCAAACCAAACAGAAGTTAAGGCGGAAGGGGTTTCTGTAGGTTCAGATGCATCACTACATAACCGCATTACAGACCTTGAAGAGGAAAGAGAAAAATTATTAAATAAATTAGATAAAGTTGAAGAAGAGCATAAAAAAGATCAT

etc... 175 sequences in total

I simply want to import the fasta files in the .mfa file with the ">Mxx_150bp" as the names of all the sequences.

I then want to identify a specfic sequence (signal sequence, present in all the fasta files) and cut all the sequences from there on, tranlate the remaining sequences into amino acids and select the first 50 amino acids, and align these to see the similatiy between them. Can this all be done in DECIPHER?

decipher DNA biostrings • 2.6k views

ADD COMMENT • link updated 8.2 years ago by Erik Wright ▴ 160 • written 8.2 years ago by reubenmcgregor88 • 0

score 2 · Accepted Answer · 2017-12-19

2

Entering edit mode

Erik Wright ▴ 160

@erik-wright-14386

Last seen 2 days ago

United States

If I understand correctly, you want to:

(1) Read in the sequences in a FASTA file:

dna <- readDNAStringSet("<<path to file.mfa>>")

(2) Identify a specific sequence present in all files -- I don't understand exactly what you mean here. Perhaps:

w <- which(dna==DNAString("GTCC..."))

(3) Cut all of the sequences from there on:

dna <- dna[1:(w[1] - 1)]

(4) Translate the sequences into amino acids:

aa <- translate(dna)

(5) Select the first 50 amino acids:

aa <- subseq(aa, 1, 50)

(6) Align these sequences:

aa_aligned <- AlignSeqs(aa)

(7) See the similarity between them -- not sure what you mean here. Perhaps:

BrowseSeqs(aa_aligned)

Or:

d <- DistanceMatrix(aa_aligned)

c <- IdClusters(d, show=TRUE)

I hope that helps! And, yes, all of this can be done with DECIPHER.

- Erik

ADD COMMENT • link 8.2 years ago Erik Wright ▴ 160

0

Entering edit mode

Thanks Erik,

I will look up all the functions individually.

on (1) my probelm is there are 175 sequences in one .mfa file. What would be your advice on importing them all in one file?

(2) Sorry if I was not clear. I go the steps in slighlty the wrong order, I would like to translate all 175 sequences first and then search for a "LPXTG", where X can be any amino acid, in all 175 translated amino acid sequences.

ADD REPLY • link 8.2 years ago reubenmcgregor88 • 0

0

Entering edit mode

Regarding (1), the above suggestion should work for multiple sequences per file.

Regarding (2), you can search for a subsequence with:

v <- vcountPattern("LPZXTG", aa, fixed="subject")

w <- which(v > 0) # sequences containing the pattern

ADD REPLY • link 8.2 years ago Erik Wright ▴ 160