[devteam-bioc] readGAlignmentPairs perfromace issue
0
0
Entering edit mode
@herve-pages-1542
Last seen 22 hours ago
Seattle, WA, United States
Hi Phil, I don't have access to your BAM file but here are the timings I get for readGAlignmentPairs(). (My file contains 100,000,000 pairs but I use 'which' to load only pairs located on chr1-4 so the result contains only 16,938,029 pairs): - with BioC 2.13: user system elapsed 439.784 30.218 470.136 - with BioC 2.14: user system elapsed 319.212 11.492 331.201 So the new code is about 40% faster for me (it also uses about 20% less memory). The timings you report below with BioC 2.14 for loading 108,592,829 pairs look reasonable to me. What is really surprising is the timing you get with BioC 2.13: only 208s to load 108,592,829 pairs! This is 15x faster than with BioC 2.14! Do you confirm this? If so, would you mind making the file accessible to us so we can have a look at it? Thanks, H. On 05/20/2014 06:31 AM, Maintainer wrote: > Hi Valerie, > > Thank you for getting back to me. Here are the times for > readGAlignmentPairs, readGAlignmentsList, and scanBam using the code you > sent. > > $readGAlignmentsList > user system elapsed > 2529.510 57.487 2589.144 > > $scanBam > user system elapsed > 2465.353 49.404 2516.275 > > $readGAlignmentPairs > user system elapsed > 2560.754 56.612 2619.769 > > Best wishes > Phil > > On Fri, 2014-05-16 at 12:55 -0700, Valerie Obenchain wrote: >> Hi Phil, >> >> We have several functions that call the same C code in the background. >> To help isolate the problem can you please run your code with scanBam() >> and readGAlignmentsList()? >> >> bf <- BamFile(fl, asMates=TRUE) >> readGAlignmentsList(bf, param=param0) >> scanBam(bf, param=param0) >> >> readGAlignmentsList() and readGAlignementPairs() should be very close in >> time. scanBam() will be faster but not by a huge amount. >> >> Thanks. >> Valerie >> >> >> On 05/13/2014 07:23 AM, Maintainer wrote: >>> Hi Guys, >>> >>> I'm experiencing some performance issues with readGAlignmentPairs from the latest version of Bioconductor (GenomicAlignments_1.0.1, BioC 2.14, R 3.1.0) >>> >>> Reading RNASeq paired reads aligned to chr19 (mm9) from a BAM file containing 108,592,829 paired reads takes 3118s. The same code run in R-3.0.2, BioC 2.13, Rsamtools_1.14.3 takes 208s. The results are identical across the two versions. >>> >>> Here's the code: >>> >>> library(GenomicAlignments) >>> library(Rsamtools) >>> >>> param0 <- ScanBamParam(which=GRanges(seqnames="chr19", >>> ranges=IRanges(start=1, end=chr19Length)) >>> rd <- readGAlignmentPairs(bamFile, param=param0) >>> >>> Any ideas as to why this might be? >>> >>> Thanks in advance >>> >>> Phil East >>> >>> >>> >>> -- output of sessionInfo(): >>> >>> R version 3.1.0 (2014-04-10) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_GB LC_NUMERIC=C LC_TIME=en_GB >>> [4] LC_COLLATE=en_GB LC_MONETARY=en_GB LC_MESSAGES=en_GB >>> [7] LC_PAPER=en_GB LC_NAME=C LC_ADDRESS=C >>> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] grDevices datasets parallel stats graphics utils methods >>> [8] base >>> >>> other attached packages: >>> [1] GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.0 >>> [4] Biostrings_2.32.0 XVector_0.4.0 GenomicRanges_1.16.3 >>> [7] GenomeInfoDb_1.0.2 IRanges_1.22.6 Biobase_2.24.0 >>> [10] BiocGenerics_0.10.0 >>> >>> loaded via a namespace (and not attached): >>> [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.0 bitops_1.0-6 >>> [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 >>> [9] fail_1.2 foreach_1.4.2 iterators_1.0.7 plyr_1.8.1 >>> [13] Rcpp_0.11.1 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 >>> [17] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> __________________________________________________________________ ______ >>> devteam-bioc mailing list >>> To unsubscribe from this mailing list send a blank email to >>> devteam-bioc-leave at lists.fhcrc.org >>> You can also unsubscribe or change your personal options at >>> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc >>> >> >> > > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. > > We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. > Cancer Research UK > Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) > A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). > Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. > > ____________________________________________________________________ ____ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
RNASeq Cancer RNASeq Cancer • 867 views
ADD COMMENT

Login before adding your answer.

Traffic: 929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6