I'm trying to use
DECIPHER::AlignSeqs to progressively align a large number of sequences (>10000) for which I have a guide tree. This fails in
dendrapply, apparently because of excessive recursion. Here is an example which replicates the error:
options(expressions = 1e4) n <- 1e4 labels <- paste0("t", seq.int(n)) # generate a random alignment aln <- sample(Biostrings::DNA_BASES, 100 * n, replace = TRUE) aln <- matrix(aln, nrow = n) aln <- apply(aln, 1, paste, collapse = "") names(aln) <- labels aln <- Biostrings::DNAStringSet(aln) # create a very deep tree x <- exp(-seq.int(n)/100) names(x) <- labels dist <- dist(x) tree <- hclust(dist, method = "single") tree <- as.dendrogram(tree) DECIPHER::AlignSeqs( myXStringSet = aln, guideTree = tree, iterations = 0, refinements = 0 )
My first attempt gave
Error: C stack usage 7971876 is too close to the limit; I ran on a machine with more RAM and got
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?. After including
options(expressions = 10000) as above, I get
Error: node stack overflow. I'm not aware of any way to increase the size of the node stack.
I've tried sorting the tree to have the deepest branches listed first or last, in hopes that this might clear some of the node stack, but it doesn't seem to help.
Is there any way to do this within
AlignSeqs works on alignments this large, and internally generates a guide tree, so it seems that this is just an issue with preprocessing the externally provided tree.