Entering edit mode
Hi Thomas,
Two new args have been added to summarizeOverlaps(), 'inter.feature'
and
'fragments'. Available in GenomicRanges 1.13.11 and Rsamtools 1.13.13.
The ?summarizeOverlaps page in GenomicRanges now has all examples (vs
having half in GenomicRanges, half in Rsamtools).
'inter.feature':
When TRUE (default) counting is as it always was - reads that hit
multiple features are resolved with one of the modes or dropped. When
FALSE, each feature that a read hits get a count. This essentially
boils
down to countOverlaps() with type="any" (Union and
IntersectionNotEmpty)
or type="within" (IntersectionStrict).
'fragments':
This argument is relevant to counting paired-end Bam files. It was
added
because of the flexibility the GAlignmentsList class offers. The
familiar GAlignmentPairs class holds reads that have been "properly
mated" with the algorithm in ?findMateAlignment. GAlignmentsList can
hold these "properly mated" reads as well the singletons, reads with
unmapped pairs and any others in the Bam.
When TRUE (default), "properly mated" and others, are counted. You can
of course still add your own filtering / QC with
param = ScanBamParam(). When FALSE, only reads that have been
"properly
mated" will be counted.
Let me know how it goes.
Valerie
On 04/08/13 17:52, Thomas Girke wrote:
> Dear Valerie,
>
> Is there currently any way to run summarizeOverlaps in a feature-
overlap
> unaware mode, e.g with an ignorefeatureOL=FALSE/TRUE setting?
Currently,
> one can switch back to countOverlaps when feature overlap
unawareness is
> the more appropriate counting mode for a biological question, but
then
> double counting of reads mapping to multiple-range features is not
> accounted for. It would be really nice to have such a feature-
overlap
> unaware option directly in summarizeOverlaps.
>
> Another question relates to the memory usage of summarizeOverlaps.
Has
> this been optimized yet? On a typical bam file with ~50-100 million
> reads the memory usage of summarizeOverlaps is often around 10-20GB.
To
> use the function on a desktop computer or in large-scale RNA-Seq
> projects on a commodity compute cluster, it would be desirable if
every
> counting instance would consume not more than 5GB of RAM.
>
> Thanks in advance for your help and suggestions,
>
> Thomas
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>