I'm trying to use DECIPHER::AlignSeqs
to progressively align a large number of sequences (>10000) for which I have a guide tree. This fails in dendrapply
, apparently because of excessive recursion. Here is an example which replicates the error:
options(expressions = 1e4)
n <- 1e4
labels <- paste0("t", seq.int(n))
# generate a random alignment
aln <- sample(Biostrings::DNA_BASES, 100 * n, replace = TRUE)
aln <- matrix(aln, nrow = n)
aln <- apply(aln, 1, paste, collapse = "")
names(aln) <- labels
aln <- Biostrings::DNAStringSet(aln)
# create a very deep tree
x <- exp(-seq.int(n)/100)
names(x) <- labels
dist <- dist(x)
tree <- hclust(dist, method = "single")
tree <- as.dendrogram(tree)
DECIPHER::AlignSeqs(
myXStringSet = aln,
guideTree = tree,
iterations = 0,
refinements = 0
)
My first attempt gave Error: C stack usage 7971876 is too close to the limit
; I ran on a machine with more RAM and got Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
. After including options(expressions = 10000)
as above, I get Error: node stack overflow
. I'm not aware of any way to increase the size of the node stack.
I've tried sorting the tree to have the deepest branches listed first or last, in hopes that this might clear some of the node stack, but it doesn't seem to help.
Is there any way to do this within DECIPHER
? AlignSeqs
works on alignments this large, and internally generates a guide tree, so it seems that this is just an issue with preprocessing the externally provided tree.