Hi guys,
My task is to find the width of intersect areas of GAlignments
reads and GRanges
objects. First, I have made sure that there are overlapping for at least 1L by using findOverlaps()
and then use pintersect() and width()
to find the width of the overlapping area. For my case, I have a spliced read as a GAlignments
object and subject as a GRanges
. I know that they overlap at one base: chr9:53621858. So I expect to get a return GAlignments
object with 1 width:
My query looks like
> query GAlignments object with 1 alignment and 1 metadata column: seqnames strand cigar qwidth start end width <Rle> <Rle> <character> <integer> <integer> <integer> <integer> [1] chr9 - 57M574N1D43M 100 53621227 53621901 675 njunc | XS <integer> | <character> [1] 1 | - ------- seqinfo: 22 sequences from an unspecified genome
The GRangesList object of this GAlignments read looks like:
> as(query, "GRangesList")
GRangesList object of length 1:
[[1]]
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr9 [53621227, 53621283] -
[2] chr9 [53621858, 53621901] -
My subject looks like
> subject GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> peak_47658 chr9 [53621329, 53621858] *
I make sure they overlap with 1L
> findOverlaps(query, subject, ignore.strand=FALSE) Hits object with 1 hit and 0 metadata columns: queryHits subjectHits <integer> <integer> [1] 1 1 ------- queryLength: 1 / subjectLength: 1
But when I apply pinterest()
, I got an error message:
> pintersect(query, subject) Error in .Call2("cigar_narrow", cigar, width(threeranges$left), width(threeranges$right), : in 'cigar[1]': CIGAR is empty after narrowing
If I convert the query read to a GRagnesList
and can get more reasonable answer:
> width(pintersect(as(query, "GRangesList"), subject[47658])) IntegerList of length 1 [[1]] 0 1
I apologize for the tedious work here. I try to write a robust function to do the task and, among several bam files, this and only this particular read gives the trouble. I don't know if it is due to the deletion in the CIGAR and I don't know how the narrowing CIGAR thing works. If I want to bypass this error, is there any tip you guys can provide?
Thanks a lot,
Chao-Jen
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] org.Mm.eg.db_3.3.0
[2] TxDb.Mmusculus.UCSC.mm10.knownGene_3.2.2
[3] GenomicFeatures_1.24.5
[4] AnnotationDbi_1.34.4
[5] GenomicAlignments_1.8.4
[6] SummarizedExperiment_1.2.3
[7] Biobase_2.32.0
[8] Rsamtools_1.24.0
[9] Biostrings_2.40.2
[10] XVector_0.12.1
[11] GenomicRanges_1.24.3
[12] GenomeInfoDb_1.8.7
[13] IRanges_2.6.1
[14] S4Vectors_0.10.3
[15] BiocGenerics_0.18.0
[16] BiocInstaller_1.22.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.8 zlibbioc_1.18.0 BiocParallel_1.6.6 tools_3.3.1
[5] DBI_0.5-1 digest_0.6.10 rtracklayer_1.32.2 bitops_1.0-6
[9] RCurl_1.95-4.8 biomaRt_2.28.0 memoise_1.0.0 RSQLite_1.1
[13] XML_3.98-1.5
The results if using R/devel & Bioconductor 3.4.
> sessionInfo()
R Under development (unstable) (2016-12-19 r71815)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicAlignments_1.10.0
[2] Rsamtools_1.26.1
[3] Biostrings_2.42.1
[4] XVector_0.14.0
[5] SummarizedExperiment_1.4.0
[6] org.Mm.eg.db_3.4.0
[7] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
[8] GenomicFeatures_1.26.0
[9] AnnotationDbi_1.37.0
[10] Biobase_2.35.0
[11] GenomicRanges_1.26.1
[12] GenomeInfoDb_1.10.1
[13] IRanges_2.8.1
[14] S4Vectors_0.12.1
[15] BiocGenerics_0.21.1
[16] BiocInstaller_1.24.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.8 zlibbioc_1.20.0 BiocParallel_1.8.1 lattice_0.20-34
[5] tools_3.4.0 grid_3.4.0 DBI_0.5-1 digest_0.6.10
[9] Matrix_1.2-7.1 rtracklayer_1.34.1 bitops_1.0-6 RCurl_1.95-4.8
[13] biomaRt_2.30.0 memoise_1.0.0 RSQLite_1.1-1 compiler_3.4.0
[17] XML_3.98-1.5
>