Entering edit mode
Davis, Wade
▴
350
@davis-wade-2803
Last seen 10.3 years ago
Dear Wei Shi,
I've just started using Rsubread, and I had a few questions.
I've got a FASTQ file with about 6.3M reads. I built my reference
index (mm10).
extdataDir<-"/mnt/hgfs/Z/"
setwd(extdataDir)
buildindex(basename="mm10_rsubread_index",reference="mm10.fa",memory=1
0000)
I then aligned using the following code:
#https://stat.ethz.ch/pipermail/bioconductor/2012-February/043552.html
#http://permalink.gmane.org/gmane.science.biology.informatics.conducto
r/34709
setwd("/mnt/hgfs/Z/myprj/")
align(
index="/mnt/hgfs/Z/mm10_rsubread_index",
readfile1="cutadapt_1-Feb_ATCACG_L003_R1_001.fastq",
output_file="cutadapt_1-Feb_ATCACG_L003_R1_001.subread.sam",
nthreads=4,
indels=2,
TH1=2)
Based on the screen output, it took about 7200 seconds in total, but
the "aligning" portion was about 3900s. The "saving the result
portion" would then seem to have taken 3300s. This is consistent with
the write speed I observed (~300 KB/s) and the SAM file size (1.16
GB).
So my questions are:
1) Does this alignment speed seem reasonable for this situation?
Based on what I read on the mailing list, I was expecting it to be a
little faster. (it does seem to be faster than novoalign @9500s) I am
not complaining about your package, I am just want to make sure I have
the settings correct.
2) Does the 'saving the result' time seem normal (as nebulous as
that term is)? Is that step bound by disk write speed? This seem to
take a very long time, so I suspect there is more going on than just
writing to disk?
3) Any recommendations/tweaks on the speed? I have about 50 files
like this, and I was hoping to try it on much larger files and in
parallel (different files).
I'd be happy to send you the FASTQ file if you like (991 MB).
Another question, which may impact the speed: is it OK to use Rsubread
to align sequences of varying lengths? I started with 50 bp single end
reads, but I needed to trim some adapter sequences. Most reads are
still 50bp, but there are some shorter sequences.
In case it matters, my hardware: 68GB RAM with 2x 4-core 3.16 Ghz
Xeons running Ubuntu 12.04 in a VM.
My session info is given below.
Thanks,
Wade
sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rsubread_1.6.3
loaded via a namespace (and not attached):
[1] tools_2.15.0
[[alternative HTML version deleted]]