Entering edit mode
Elena Grassi
▴
10
@elena-grassi-6138
Last seen 7.2 years ago
Italy, Rozzano, Humanitas University
Hello,
I've stumbled across a strange behaviour when indexing bams and then
reading them with ReadGappedAlignments with a param
holding a which section.
Given this bam:
data at tungsteno:/rogue_bis/data/bioinfotree/prj/roar/dataset/0.2/roa
r_test_2$
samtools view test.bam
D1JP1ACXX130107:2:2201:7003:63880 16 chr1 243675625
255 92M * 0 0 * *
> library(GenomicRanges)
> bam <-
"/scarlet/data/bioinfotree/prj/roar/dataset/0.2/roar_test_2/test.bam"
> ga <- readGappedAlignments(bam)
> ga
GappedAlignments with 1 alignment and 0 metadata columns:
seqnames strand cigar qwidth start end
width ngap
<rle> <rle> <character> <integer> <integer> <integer>
<integer> <integer>
[1] chr1 - 92M 92 243675625 243675716
92 0
---
seqlengths:
chr1 chr2 ...
chrUn_gl000226 chr18_gl000207_random
249250621 243199373 ...
15008 4262
Everything is fine here. But if index it and load it with a param:
> gene_id <- c("A", "B")
> features <- GRanges(
+ seqnames = Rle(c("chr1", "chr1")),
+ strand = strand(c("-", "-")),
+ ranges = IRanges(
+ start=c(243675625, 243675627),
+ width=c(2, 100)),
+ DataFrame(gene_id)
+ )
> indexBam(bam)
/tmp/RtmpVbmBLu/filead32d37188.bam
"/tmp/RtmpVbmBLu/filead32d37188.bam.bai"
> ga_s_p <- readGappedAlignments(bam,
param=ScanBamParam(which=features))
> ga_s_p
GappedAlignments with 2 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end
width ngap
<rle> <rle> <character> <integer> <integer> <integer>
<integer> <integer>
[1] chr1 - 92M 92 243675625 243675716
92 0
[2] chr1 - 92M 92 243675625 243675716
92 0
---
seqlengths:
chr1 chr2 ...
chrUn_gl000226 chr18_gl000207_random
249250621 243199373 ...
15008 4262
>
The single read is duplicated...I'm not an expert with these libraries
so maybe this is my fault (if I remove
the second entry in features everything works, but from the
ScanBamParam man about which I was not expecting
this behaviour).
Sorry for the convoluted example but I've stumbled on this problem
while comparing counts obtained
with summarizeOverlaps in a script of mine.
Here's my session info:
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] rtracklayer_1.20.2 Rsamtools_1.12.3 Biostrings_2.28.0
[4] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0
loaded via a namespace (and not attached):
[1] bitops_1.0-5 BSgenome_1.28.0 RCurl_1.95-4.1 stats4_3.0.1
[5] tools_3.0.1 XML_3.98-1.1 zlibbioc_1.6.0
Thanks,
E.G.
--
$ pom