Entering edit mode
Dear all,
I have a problem finding nerly exact overlaps between read pairs and
genomic regions. I try this by using GAlignmentPairs and
GenomicRanges:
> library(Rsamtools) ex1_file <- system.file("extdata", "ex1.bam",
> package="Rsamtools") galp <- readGAlignmentPairs(ex1_file,
use.names=TRUE)
> galp[3:4]
GAlignmentPairs with 2 alignment pairs and 0 metadata columns:
seqnames strand : ranges -- ranges
<rle> <rle> : <iranges> -- <iranges>
EAS54_65:3:321:311:983 seq1 + : [51, 85] -- [228, 262]
B7_591:5:42:540:501 seq1 + : [60, 95] -- [224, 259]
---
seqlengths:
seq1 seq2 1575 1584
> ranges <- GRanges(c("seq1", "seq1"), IRanges(c(50, 57), end = c(263,
260)))
> ranges
GRanges with 2 ranges and 0 metadata columns:
seqnames ranges strand
<rle> <iranges> <rle>
[1] seq1 [50, 263] *
[2] seq1 [57, 260] *
---
seqlengths:
seq1
NA
> summary <-summarizeOverlaps(ranges, galp[3:4], mode =
"IntersectionStrict",
+ inter.feature = FALSE, fragments =
FALSE)
Warning message:
In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
the condition has length > 1 and only the first element will be used
> assays(summary)$counts
reads
[1,] 2
[2,] 1
I am not entirely sure what the warning means, but my point is a
different one: I use the "IntersectStrict" mode because I only want to
count read
pairs that are completely within the interval. This is what I
get, but I want to use a more strict selection for "almost equal",
i.e. intervals that are the same size or only slightly larger (say 3
bp)
than the range of the read pairs. So ranges[1] should count galp[3],
but
not galp[4]
Any hint will be appreciated....
Best wishes,
Georg
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] Rsamtools_1.14.3 Biostrings_2.30.1 GenomicRanges_1.14.4
[4] XVector_0.2.0 IRanges_1.20.7 BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 compiler_3.0.1 stats4_3.0.1 tools_3.0.1
zlibbioc_1.8.0