Question: How do I extract the complete MsaAAMultipleAlignment object to a file
0
19 months ago by
Assa Yeroslaviz1.4k
Munich, Germany
Assa Yeroslaviz1.4k wrote:

Hi all,

I was wondering how I can export the results of my msa run into a file.

I know I can export it to a fasta file using unmasked(msa), but I would like to export the complete results I see in the terminal to a file or if possible in clustal format.

this is my script:

P <- msa(seqs, order = "input", "ClustalOmega")
print(P, show = "complete")
writeXStringSet(unmasked(P), file="test.clustal")

But this gives ma a fasta file. Is there a way to get the clustal format as an  output?

thanks

modified 19 months ago by UBodenhofer250 • written 19 months ago by Assa Yeroslaviz1.4k
Answer: How do I extract the complete MsaAAMultipleAlignment object to a file
1
19 months ago by
UBodenhofer250
Johannes Kepler University, Linz, Austria
UBodenhofer250 wrote:

Reply to question #1: The width is controlled by the global R option 'width' (in line with the print() methods in the 'Biostrings' package). So you can change the number of columns yourself. If you want to avoid this to have a global effect on your R session, you should save the current value and then, after running print(), revert to the previous value. Here is an example (the first lines are to be copied from above):

saveWidth <- getOption("width")
options(width=100)
sink("myAlignment.txt")
print(myAlignment, show="complete", halfNrow=-1)
sink()
options(width=saveWidth)

Reply to question #2: Sorry, no. This is an implementation that is, to quite some extent, copied from or at least inspired by the methods in the 'Biostrings' package. You can influence the width of names by the 'nameWidth' argument, but that's all. If you want to tailor the print() methods to your very specific needs, you will have to make a copy of the relevant parts of the source file msa/R/print-methods.R and customize the code of your local copy to your needs.

Answer: How do I extract the complete MsaAAMultipleAlignment object to a file
0
19 months ago by
UBodenhofer250
Johannes Kepler University, Linz, Austria
UBodenhofer250 wrote:

I am sorry, we currently do not have any functionality for saving clustal files in our 'msa' package. I quickly had a look whether there are other options. I came across the thread "R write alignment MSA" on Biostars. The second option sounds fantastic, but I checked it and it does not seem possible. So it seems what you are asking for does not exist. So you'd probably better resort to the option to save the multiple alignment to a FASTA file and to convert it to clustal format using an external tool (option #1 in the Biostars thread mentioned above).

In any case, we are currently thinking of what extensions can be useful for the  'msa' package. The methods for saving clustal files are hidden in the multiple sequence alignment tools that are part of the 'msa' package, they are just not exposed to the end user. So it should not be a big deal to do this. We will not manage for this year's spring release of Bioconductor (3.7), but I am optimistic for 3.8.

Thanks for the answer. It would be great if you can make it happens for BioC 3.8.

But if clustal is not possible, is there a way to extract the object as is ?

I have this results object:

MsaAAMultipleAlignment with 154 rows and 1142 columns
aln (1..79)                                                                     names
[1] MLVLRCRLGTSFPKLDNLVPKGKMKILLVFLGLLGNSVAMPMHMPRMPGFSSKSEEMMRYNQFNFMNGPHMAHLGPFFG sp|Q9NRM1|ENAM_HU...
[2] ------------------------------------------------------------------------------- bestMsms_id_68
[3] ------------------------------------------------------------------------------- bestMsms_id_73
[4] ------------------------------------------------------------------------------- bestMsms_id_90
... ...
[151] ------------------------------------------------------------------------------- dn_ms_scan_number...
[152] ------------------------------------------------------------------------------- dn_ms_scan_number...
[153] ------------------------------------------------------------------------------- dn_ms_scan_number...
[154] ------------------------------------------------------------------------------- dn_ms_scan_number...
Con ------------------------------------------------------------------------------- Consensus

aln (80..158)                                                                   names
[1] NGLPQQFPQYQMPMWPQPPPNTWHPRKSSAPKRHNKTDQTQETQKPNQTQSKKPPQKRPLKQPSHNQPQPEEEAQPPQA sp|Q9NRM1|ENAM_HU...
[2] ------------------------------------------------------------------------------- bestMsms_id_68
[3] ------------------------------------------------------------------------------- bestMsms_id_73
[4] ------------------------------------------------------------------------------- bestMsms_id_90

...

And I would like to extract it into a file. Is it possible, without converting it into a fastA file?

thanks

Well, what you can alsways do is to dump this output to a file using sink(). Here is a code snippet:

library(msa)

filepath <- system.file("examples", "exampleAA.fasta", package="msa")

myAlignment <- msaClustalW(mySeqs)

sink("myAlignment.txt")
print(myAlignment, show="complete")
sink()

thanks, but this is not exactly what I need

Sorry i was happy too fast. this is not exactly what I need. When I do this, I still get only the first nine row and the lst nine rows of the file with the ... between them.

I would like to get the complete file - in my case all 154 rows in the text file. Is this possible?

Are you sure you have used the option 'show="complete"'? When I run my example code from above, I see everything.

Hi, Yes I am sure. I have solved it by adding the parameter halfNrow = 100

1

Excellent, I'm glad you solved it. You can also set 'halfNrow' to -1 or NA, then it prints everything.

I have a two other questions

1. is it possible to set the numbers of letters in each row?

For now it depends on the windows size (AFAIK), but I would like to set it to 60 (e.g). is it possible?

2. Is it possible to put the name of the organisms at the beginning of the row and not at the end?

thanks again