Search
Question: GenomicAlignments - Speeding up readGAlignmentPairs()
0
gravatar for Fong Chun Chan
4.5 years ago by
Fong Chun Chan320 wrote:
Hi, I've been using the GenomicFeatures R package to read in some RNA-seq paired-end read data. While the readGAlignments() function reads in the bam file within a minute, I've noticed that the readGAlignmentPairs() function is extremely slow. This is even after restricting the space to just a single chromosome along with setting the flags isDuplicate = FALSE and isNotPrimaryRead = FALSE (the latter being suggested in the reference): "An easy way to avoid this degradation is to load only primary alignments by setting the isNotPrimaryRead flag to FALSE in ScanBamParam()" Based on my understanding, it seems that the findMateAlignment() function has to be run first. Is there any other way to speed it up? Perhaps I am doing something wrong here? Code and sessionInfo() below. Thanks! ---- library("GenomicFeatures") library('GenomicAlignments') library('Rsamtools') si <- seqinfo(BamFile(arguments$bamFile)) gr <- GRanges(arguments$chr, IRanges(100, seqlengths(si)[[arguments$chr]]-100)) allExons <- exons(txdb, columns = c('gene_id', 'exon_id', 'exon_name')) scf <- scanBamFlag( isDuplicate = FALSE, isNotPrimaryRead = FALSE ) # remove duplicate reads, remove non-primary reads (i.e. multi-aligned reads) reads <- readGAlignmentPairs( arguments$bamFile, param = ScanBamParam( which = gr, flag = scf ) ); # grab reads in specific region > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] argparse_1.0.1 proto_0.3-10 GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.1 Biostrings_2.32.0 XVector_0.4.0 GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9 BiocGenerics_0.10.0 vimcom.plus_0.9-93 [16] setwidth_1.0-3 colorout_1.0-2 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.7 BiocParallel_0.6.1 biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 checkmate_1.0 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 fail_1.2 findpython_1.0.1 foreach_1.4.2 getopt_1.20.0 iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 RCurl_1.95-4.1 [19] rjson_0.2.14 RSQLite_0.11.4 rtracklayer_1.24.2 sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 tools_3.1.0 XML_3.98-1.1 zlibbioc_1.10.0 > [[alternative HTML version deleted]]
ADD COMMENTlink written 4.5 years ago by Fong Chun Chan320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 176 users visited in the last hour