readGAlignmentPairs perfromace issue
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hi Guys, I'm experiencing some performance issues with readGAlignmentPairs from the latest version of Bioconductor (GenomicAlignments_1.0.1, BioC 2.14, R 3.1.0) Reading RNASeq paired reads aligned to chr19 (mm9) from a BAM file containing 108,592,829 paired reads takes 3118s. The same code run in R-3.0.2, BioC 2.13, Rsamtools_1.14.3 takes 208s. The results are identical across the two versions. Here's the code: library(GenomicAlignments) library(Rsamtools) param0 <- ScanBamParam(which=GRanges(seqnames="chr19", ranges=IRanges(start=1, end=chr19Length)) rd <- readGAlignmentPairs(bamFile, param=param0) Any ideas as to why this might be? Thanks in advance Phil East -- output of sessionInfo(): R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB LC_NUMERIC=C LC_TIME=en_GB [4] LC_COLLATE=en_GB LC_MONETARY=en_GB LC_MESSAGES=en_GB [7] LC_PAPER=en_GB LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB LC_IDENTIFICATION=C attached base packages: [1] grDevices datasets parallel stats graphics utils methods [8] base other attached packages: [1] GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.0 [4] Biostrings_2.32.0 XVector_0.4.0 GenomicRanges_1.16.3 [7] GenomeInfoDb_1.0.2 IRanges_1.22.6 Biobase_2.24.0 [10] BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.0 bitops_1.0-6 [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 [9] fail_1.2 foreach_1.4.2 iterators_1.0.7 plyr_1.8.1 [13] Rcpp_0.11.1 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 [17] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 -- Sent via the guest posting facility at bioconductor.org.
RNASeq RNASeq • 926 views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
Hi Phil, We have several functions that call the same C code in the background. To help isolate the problem can you please run your code with scanBam() and readGAlignmentsList()? bf <- BamFile(fl, asMates=TRUE) readGAlignmentsList(bf, param=param0) scanBam(bf, param=param0) readGAlignmentsList() and readGAlignementPairs() should be very close in time. scanBam() will be faster but not by a huge amount. Thanks. Valerie On 05/13/2014 07:23 AM, Maintainer wrote: > Hi Guys, > > I'm experiencing some performance issues with readGAlignmentPairs from the latest version of Bioconductor (GenomicAlignments_1.0.1, BioC 2.14, R 3.1.0) > > Reading RNASeq paired reads aligned to chr19 (mm9) from a BAM file containing 108,592,829 paired reads takes 3118s. The same code run in R-3.0.2, BioC 2.13, Rsamtools_1.14.3 takes 208s. The results are identical across the two versions. > > Here's the code: > > library(GenomicAlignments) > library(Rsamtools) > > param0 <- ScanBamParam(which=GRanges(seqnames="chr19", > ranges=IRanges(start=1, end=chr19Length)) > rd <- readGAlignmentPairs(bamFile, param=param0) > > Any ideas as to why this might be? > > Thanks in advance > > Phil East > > > > -- output of sessionInfo(): > > R version 3.1.0 (2014-04-10) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB LC_NUMERIC=C LC_TIME=en_GB > [4] LC_COLLATE=en_GB LC_MONETARY=en_GB LC_MESSAGES=en_GB > [7] LC_PAPER=en_GB LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB LC_IDENTIFICATION=C > > attached base packages: > [1] grDevices datasets parallel stats graphics utils methods > [8] base > > other attached packages: > [1] GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.0 > [4] Biostrings_2.32.0 XVector_0.4.0 GenomicRanges_1.16.3 > [7] GenomeInfoDb_1.0.2 IRanges_1.22.6 Biobase_2.24.0 > [10] BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.0 bitops_1.0-6 > [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 > [9] fail_1.2 foreach_1.4.2 iterators_1.0.7 plyr_1.8.1 > [13] Rcpp_0.11.1 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 > [17] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > ____________________________________________________________________ ____ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fhcrc.org Phone: (206) 667-3158
ADD COMMENT
0
Entering edit mode
Hi Valerie, Thank you for getting back to me. Here are the times for readGAlignmentPairs, readGAlignmentsList, and scanBam using the code you sent. $readGAlignmentsList user system elapsed 2529.510 57.487 2589.144 $scanBam user system elapsed 2465.353 49.404 2516.275 $readGAlignmentPairs user system elapsed 2560.754 56.612 2619.769 Best wishes Phil On Fri, 2014-05-16 at 12:55 -0700, Valerie Obenchain wrote: > Hi Phil, > > We have several functions that call the same C code in the background. > To help isolate the problem can you please run your code with scanBam() > and readGAlignmentsList()? > > bf <- BamFile(fl, asMates=TRUE) > readGAlignmentsList(bf, param=param0) > scanBam(bf, param=param0) > > readGAlignmentsList() and readGAlignementPairs() should be very close in > time. scanBam() will be faster but not by a huge amount. > > Thanks. > Valerie > > > On 05/13/2014 07:23 AM, Maintainer wrote: > > Hi Guys, > > > > I'm experiencing some performance issues with readGAlignmentPairs from the latest version of Bioconductor (GenomicAlignments_1.0.1, BioC 2.14, R 3.1.0) > > > > Reading RNASeq paired reads aligned to chr19 (mm9) from a BAM file containing 108,592,829 paired reads takes 3118s. The same code run in R-3.0.2, BioC 2.13, Rsamtools_1.14.3 takes 208s. The results are identical across the two versions. > > > > Here's the code: > > > > library(GenomicAlignments) > > library(Rsamtools) > > > > param0 <- ScanBamParam(which=GRanges(seqnames="chr19", > > ranges=IRanges(start=1, end=chr19Length)) > > rd <- readGAlignmentPairs(bamFile, param=param0) > > > > Any ideas as to why this might be? > > > > Thanks in advance > > > > Phil East > > > > > > > > -- output of sessionInfo(): > > > > R version 3.1.0 (2014-04-10) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_GB LC_NUMERIC=C LC_TIME=en_GB > > [4] LC_COLLATE=en_GB LC_MONETARY=en_GB LC_MESSAGES=en_GB > > [7] LC_PAPER=en_GB LC_NAME=C LC_ADDRESS=C > > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB LC_IDENTIFICATION=C > > > > attached base packages: > > [1] grDevices datasets parallel stats graphics utils methods > > [8] base > > > > other attached packages: > > [1] GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.0 > > [4] Biostrings_2.32.0 XVector_0.4.0 GenomicRanges_1.16.3 > > [7] GenomeInfoDb_1.0.2 IRanges_1.22.6 Biobase_2.24.0 > > [10] BiocGenerics_0.10.0 > > > > loaded via a namespace (and not attached): > > [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.0 bitops_1.0-6 > > [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 > > [9] fail_1.2 foreach_1.4.2 iterators_1.0.7 plyr_1.8.1 > > [13] Rcpp_0.11.1 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 > > [17] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > __________________________________________________________________ ______ > > devteam-bioc mailing list > > To unsubscribe from this mailing list send a blank email to > > devteam-bioc-leave at lists.fhcrc.org > > You can also unsubscribe or change your personal options at > > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > > > > NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:8}}
ADD REPLY

Login before adding your answer.

Traffic: 947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6