Rsubread package usage and speed questions
1
0
Entering edit mode
Davis, Wade ▴ 350
@davis-wade-2803
Last seen 10.2 years ago
Dear Wei Shi, I've just started using Rsubread, and I had a few questions. I've got a FASTQ file with about 6.3M reads. I built my reference index (mm10). extdataDir<-"/mnt/hgfs/Z/" setwd(extdataDir) buildindex(basename="mm10_rsubread_index",reference="mm10.fa",memory=1 0000) I then aligned using the following code: #https://stat.ethz.ch/pipermail/bioconductor/2012-February/043552.html #http://permalink.gmane.org/gmane.science.biology.informatics.conducto r/34709 setwd("/mnt/hgfs/Z/myprj/") align( index="/mnt/hgfs/Z/mm10_rsubread_index", readfile1="cutadapt_1-Feb_ATCACG_L003_R1_001.fastq", output_file="cutadapt_1-Feb_ATCACG_L003_R1_001.subread.sam", nthreads=4, indels=2, TH1=2) Based on the screen output, it took about 7200 seconds in total, but the "aligning" portion was about 3900s. The "saving the result portion" would then seem to have taken 3300s. This is consistent with the write speed I observed (~300 KB/s) and the SAM file size (1.16 GB). So my questions are: 1) Does this alignment speed seem reasonable for this situation? Based on what I read on the mailing list, I was expecting it to be a little faster. (it does seem to be faster than novoalign @9500s) I am not complaining about your package, I am just want to make sure I have the settings correct. 2) Does the 'saving the result' time seem normal (as nebulous as that term is)? Is that step bound by disk write speed? This seem to take a very long time, so I suspect there is more going on than just writing to disk? 3) Any recommendations/tweaks on the speed? I have about 50 files like this, and I was hoping to try it on much larger files and in parallel (different files). I'd be happy to send you the FASTQ file if you like (991 MB). Another question, which may impact the speed: is it OK to use Rsubread to align sequences of varying lengths? I started with 50 bp single end reads, but I needed to trim some adapter sequences. Most reads are still 50bp, but there are some shorter sequences. In case it matters, my hardware: 68GB RAM with 2x 4-core 3.16 Ghz Xeons running Ubuntu 12.04 in a VM. My session info is given below. Thanks, Wade sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rsubread_1.6.3 loaded via a namespace (and not attached): [1] tools_2.15.0 [[alternative HTML version deleted]]
Alignment Rsubread Alignment Rsubread • 1.8k views
ADD COMMENT
0
Entering edit mode
Davis, Wade ▴ 350
@davis-wade-2803
Last seen 10.2 years ago
In case others run into the same issue, here is an update: Wei suggested that it was due to a slow I/O issue in my virtual machine, and sure enough that was the issue. More precisely, I was accessing the data from a shared folder. On a native Linux installation, the same code below ran in about 500 seconds (but different hardware). Thanks to Wei for the help and for this fine package. Wade -----Original Message----- From: Davis, Wade [mailto:davisjwa@health.missouri.edu] Sent: Sunday, May 27, 2012 9:01 AM To: bioconductor at r-project.org Subject: [BioC] Rsubread package usage and speed questions Dear Wei Shi, I've just started using Rsubread, and I had a few questions. I've got a FASTQ file with about 6.3M reads. I built my reference index (mm10). extdataDir<-"/mnt/hgfs/Z/" setwd(extdataDir) buildindex(basename="mm10_rsubread_index",reference="mm10.fa",memory=1 0000) I then aligned using the following code: #https://stat.ethz.ch/pipermail/bioconductor/2012-February/043552.html #http://permalink.gmane.org/gmane.science.biology.informatics.conducto r/34709 setwd("/mnt/hgfs/Z/myprj/") align( index="/mnt/hgfs/Z/mm10_rsubread_index", readfile1="cutadapt_1-Feb_ATCACG_L003_R1_001.fastq", output_file="cutadapt_1-Feb_ATCACG_L003_R1_001.subread.sam", nthreads=4, indels=2, TH1=2) Based on the screen output, it took about 7200 seconds in total, but the "aligning" portion was about 3900s. The "saving the result portion" would then seem to have taken 3300s. This is consistent with the write speed I observed (~300 KB/s) and the SAM file size (1.16 GB). So my questions are: 1) Does this alignment speed seem reasonable for this situation? Based on what I read on the mailing list, I was expecting it to be a little faster. (it does seem to be faster than novoalign @9500s) I am not complaining about your package, I am just want to make sure I have the settings correct. 2) Does the 'saving the result' time seem normal (as nebulous as that term is)? Is that step bound by disk write speed? This seem to take a very long time, so I suspect there is more going on than just writing to disk? 3) Any recommendations/tweaks on the speed? I have about 50 files like this, and I was hoping to try it on much larger files and in parallel (different files). I'd be happy to send you the FASTQ file if you like (991 MB). Another question, which may impact the speed: is it OK to use Rsubread to align sequences of varying lengths? I started with 50 bp single end reads, but I needed to trim some adapter sequences. Most reads are still 50bp, but there are some shorter sequences. In case it matters, my hardware: 68GB RAM with 2x 4-core 3.16 Ghz Xeons running Ubuntu 12.04 in a VM. My session info is given below. Thanks, Wade sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rsubread_1.6.3 loaded via a namespace (and not attached): [1] tools_2.15.0 [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6