Question

Multiple alignment of big DNA sequences files

0

Entering edit mode

Giuseppe ▴ 20

@giuseppe-16310

Last seen 7.2 years ago

Hi everybody, I have 12 fasta files of DNA sequences that I downloaded from NCBI. Every file contains ~ 30000 rows and ~2MLN of characters.

I have Ubuntu 18.04 VPS with 32 GB of RAM, 8 CPUs.

I want multiple alignment of sequences with "msaClustalOmega" function of the Bioconductor package msa. I executed the following code:

mySeqs <- readDNAStringSet(file.choose())

align<-msaClustalOmega(mySeqs, dealign=FALSE)

When I executed the istruction align<-msaClustalOmega(mySeqs, dealign=FALSE) the R process is killed due to lack of RAM.

How to resolve the RAM problem?

How to obtain non-aligned sequences as output? I tried dealign=TRUE, but I got the same output.

Thank you so much and sorry for my bad English

multiplealignment genome • 1.8k views

ADD COMMENT • link 7.2 years ago Giuseppe ▴ 20

0

Entering edit mode

Hi,

Are you sure that there is a meaningful alignment that can be created for these sequences? I do not think this is an easy task in R or in any other language. There are large number of softwares and approaches to multiple alignments that might be of help. Most of these are not part of bioconductor, but this thread on biostars contains some useful links as well as discussion around msa of long sequences.

https://www.biostars.org/p/294983/

In addition, the manual of msa the state that ClustaOmega only supports amino acids distance matrices and is hence not really useful for nucleotide sequences. Is there any particular reason you are using this method instead of the other?

ADD REPLY • link 7.2 years ago thokall ▴ 160