I have a
DNAStringSet object with many
DNAStrings in it, and I want to subset each one of them from position 1 to the minimum between its length and a fixed cutoff.
So far I'm using a
loop for this, as in this example:
library(dplyr) set.seed(1) seq.set <- lapply(1:100, function(s) paste(sample(c("A","C","G","T"),as.integer(abs(rnorm(1,500,1000))),replace = T), collapse="")) %>% unlist() %>% Biostrings::DNAStringSet(.) for(s in 1:length(seq.set)) seq.set[s] <- Biostrings::subseq(seq.set[s], 1, min(650, Biostrings::width(seq.set[s])))
But because in reality the size of my
DNAStringSet is ~200,000
DNAStrings it takes quite a while.
Any faster solution?