Multiple alignment of big DNA sequences files
Entering edit mode
Giuseppe ▴ 20
Last seen 5.8 years ago

Hi everybody, I have 12 fasta files of DNA sequences that I downloaded from NCBI. Every file contains ~ 30000 rows and ~2MLN of characters.

I have Ubuntu 18.04 VPS with 32 GB of RAM, 8 CPUs.

I want multiple alignment of sequences with "msaClustalOmega" function of the Bioconductor package msa. I executed the following code:

mySeqs <- readDNAStringSet(file.choose())

align<-msaClustalOmega(mySeqs, dealign=FALSE)

When I executed the istruction align<-msaClustalOmega(mySeqs, dealign=FALSE) the R process is killed due to lack of RAM.

How to resolve the RAM problem?

How to obtain non-aligned sequences as output? I tried dealign=TRUE, but I got the same output.

Thank you so much and sorry for my bad English

multiplealignment genome • 1.2k views
Entering edit mode


Are you sure that there is a meaningful alignment that can be created for these sequences? I do not think this is an easy task in R or in any other language. There are large number of softwares and approaches to multiple alignments that might be of help. Most of these are not part of bioconductor, but this thread on biostars contains some useful links as well as discussion around msa of long sequences.

In addition, the manual of msa the state that ClustaOmega only supports amino acids distance matrices and is hence not really useful for nucleotide sequences. Is there any particular reason you are using this method instead of the other?


Login before adding your answer.

Traffic: 386 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6