Entering edit mode
Michael Dondrup
▴
550
@michael-dondrup-3849
Last seen 10.2 years ago
Hi,
I was trying to use write.XStringView on a larger dataset but to no
avail. It seems like it is not implemented
efficiently. What I am trying is:
I downloaded
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/chr1.fa.gz
> library(Biostrings)
> dnasts <- read.DNAStringSet(file="chr1.fa")
# break up the fasta file into segments of size 60
> dnaviews <- Views(dnasts[[1]], start = seq(1, length(dnasts[[1]]),
60), width=60)
> write.XStringViews(dnaviews, file="out.fa")
... I interrupted the process after 1h reaching a memory peak of over
3GB.
In principle doing the whole task should not take longer than a few
seconds. I found this report:
https://stat.ethz.ch/pipermail/bioc-sig-
sequencing/2010-April/001160.html
I guess that is the same problem? Has there been any progress?
Is there probably a more efficient way of implementing this, e.g.
using cat()?
Thanks a lot
Michael
> sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-unknown-linux-gnu
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.16.9 IRanges_1.6.8
loaded via a namespace (and not attached):
[1] Biobase_2.8.0
>