GenomicAlignments - Speeding up readGAlignmentPairs()
0
0
Entering edit mode
@fong-chun-chan-5706
Last seen 9.6 years ago
Hi, I've been using the GenomicFeatures R package to read in some RNA-seq paired-end read data. While the readGAlignments() function reads in the bam file within a minute, I've noticed that the readGAlignmentPairs() function is extremely slow. This is even after restricting the space to just a single chromosome along with setting the flags isDuplicate = FALSE and isNotPrimaryRead = FALSE (the latter being suggested in the reference): "An easy way to avoid this degradation is to load only primary alignments by setting the isNotPrimaryRead flag to FALSE in ScanBamParam()" Based on my understanding, it seems that the findMateAlignment() function has to be run first. Is there any other way to speed it up? Perhaps I am doing something wrong here? Code and sessionInfo() below. Thanks! ---- library("GenomicFeatures") library('GenomicAlignments') library('Rsamtools') si <- seqinfo(BamFile(arguments$bamFile)) gr <- GRanges(arguments$chr, IRanges(100, seqlengths(si)[[arguments$chr]]-100)) allExons <- exons(txdb, columns = c('gene_id', 'exon_id', 'exon_name')) scf <- scanBamFlag( isDuplicate = FALSE, isNotPrimaryRead = FALSE ) # remove duplicate reads, remove non-primary reads (i.e. multi-aligned reads) reads <- readGAlignmentPairs( arguments$bamFile, param = ScanBamParam( which = gr, flag = scf ) ); # grab reads in specific region > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] argparse_1.0.1 proto_0.3-10 GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.1 Biostrings_2.32.0 XVector_0.4.0 GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9 BiocGenerics_0.10.0 vimcom.plus_0.9-93 [16] setwidth_1.0-3 colorout_1.0-2 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.7 BiocParallel_0.6.1 biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 checkmate_1.0 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 fail_1.2 findpython_1.0.1 foreach_1.4.2 getopt_1.20.0 iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 RCurl_1.95-4.1 [19] rjson_0.2.14 RSQLite_0.11.4 rtracklayer_1.24.2 sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 tools_3.1.0 XML_3.98-1.1 zlibbioc_1.10.0 > [[alternative HTML version deleted]]
GenomicFeatures GenomicFeatures • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 841 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6