On 06/22/2012 06:56 AM, Lakshmanan Iyer wrote:
> My apologies for multiple posting if it happens-- I sent the last
> from other accounts which may not be registered with Bioc-list
> Two questions:
> 1. Is readGapppedAlignmentPairs - the most efficient way to read a
> paired-end bam file with mulit-mapped reads?
> I am asking as it takes an enormous amount of time to process and
With a recent version of Rsamtools (>= 1.9.10), last time I tried it
took between 30 and 40 min to call readGapppedAlignmentPairs() on a
file with 70 million records (35 million pairs). Even for a fixed nb
records and using the same machine, timing could vary a lot depending
on how "hard" it is to pair the records which mostly depends on the
average number of records sharing the same QNAME (the lowest, the
easiest). You can get that number using the new quickBamCounts()
group | nb of | nb of | mean /
of | records | unique | records
records | in group | QNAMEs | unique
All records........................ A | 70446309 | 35407913 | 1.99 / 2
o template has single segment.... S | 0 | 0 | NA /
o template has multiple segments. M | 70446309 | 35407913 | 1.99 /
- first segment.............. F | 35313360 | 35313360 | 1 /
- last segment............... L | 35132949 | 35132949 | 1 /
- other segment.............. O | 0 | 0 | NA /
Note that (S, M) is a partitioning of A, and (F, L, O) is a
Indentation reflects this.
Here the average number of records per unique QNAME is 1.99 (and the
is 2), which is ideal.
I don't remember the exact amount of memory needed to load that file
with 70M records though. All I remember is that you need a lot :-/
(probably > 20GB). Make sure you have enough memory and that your
is not swapping.
readGapppedAlignmentPairs() is basically calling
followed by findMateAlignment(). Each of the 2 operations are
and IIRC I'm not sure there are obvious/easy optimizations that could
be done on readGapppedAlignments() itself. However, for
findMateAlignment(), there are a few easy ones that have been
on my list for a couple of months now and that I will implement ASAP.
They should improve speed and reduce memory usage.
> 2. How does one work with coverage on GappedAlignmentPairs in the
> of RNASeq?
> The simplest way is to consider each left and right read as separate
> essentially loose the "paired" information and calculate coverage.
> However, if both the left and right pair reads fall within a feature
> interest - say an exon, does it imply coverage of the region of the
> between the reads too
> In the figure above, the exon is represented by ">" and L and R
> the left and right reads aligned to the exon.
> I am talking about the region represented by "^". Do we assume
> for this region too?
> Does Coverage on GappedAlignmentPairs do this?
No, coverage on a GappedAlignmentPairs does not do this, but
'coverage(range(grglist(galp)))' will do this.
> Center for Neuroscience Research
> Tufts Univeristy School of Medicine
> Boston, MA
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives:
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319