GenomicAlignment::pintersect() of two overlapping ranges - CIGAR is empty after narrowing
1
0
Entering edit mode
@chao-jen-wong-7035
Last seen 8 months ago
USA/Seattle/Fred Hutchinson Cancer Rese…

Hi guys, 

My task is to find the width of intersect areas of GAlignments reads and GRanges objects. First, I have made sure that there are overlapping for at least 1L by using findOverlaps() and then use pintersect() and width() to find the width of the overlapping area. For my case, I have a spliced read as a GAlignments object and subject as a GRanges. I know that they overlap at one base: chr9:53621858. So I expect to get a return GAlignments object with 1 width:

My query looks like

> query
GAlignments object with 1 alignment and 1 metadata column:
      seqnames strand        cigar    qwidth     start       end     width
         <Rle>  <Rle>  <character> <integer> <integer> <integer> <integer>
  [1]     chr9      - 57M574N1D43M       100  53621227  53621901       675
          njunc |          XS
      <integer> | <character>
  [1]         1 |           -
  -------
  seqinfo: 22 sequences from an unspecified genome

The GRangesList object of this GAlignments read looks like:

> as(query, "GRangesList")
GRangesList object of length 1:
[[1]]
GRanges object with 2 ranges and 0 metadata columns:
      seqnames               ranges strand
         <Rle>            <IRanges>  <Rle>
  [1]     chr9 [53621227, 53621283]      -
  [2]     chr9 [53621858, 53621901]      -

 

My subject looks like

> subject
GRanges object with 1 range and 0 metadata columns:
             seqnames               ranges strand
                <Rle>            <IRanges>  <Rle>
  peak_47658     chr9 [53621329, 53621858]      *

I make sure they overlap with 1L

> findOverlaps(query, subject, ignore.strand=FALSE)
Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  -------
  queryLength: 1 / subjectLength: 1

 

But when I apply pinterest(), I got an error message:

> pintersect(query, subject)
Error in .Call2("cigar_narrow", cigar, width(threeranges$left), width(threeranges$right),  :
  in 'cigar[1]': CIGAR is empty after narrowing

If I convert the query read to a GRagnesList and can get more reasonable answer:

> width(pintersect(as(query, "GRangesList"), subject[47658]))
IntegerList of length 1
[[1]] 0 1

 

I apologize for the tedious work here. I try to write a robust function to do the task and, among several bam files,  this and only this particular read gives the trouble. I don't know if it is due to the deletion in the CIGAR and I don't know how the narrowing CIGAR thing works. If I want to bypass this error, is there any tip you guys can provide? 

 

Thanks a lot,

Chao-Jen

 

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] org.Mm.eg.db_3.3.0
 [2] TxDb.Mmusculus.UCSC.mm10.knownGene_3.2.2
 [3] GenomicFeatures_1.24.5
 [4] AnnotationDbi_1.34.4
 [5] GenomicAlignments_1.8.4
 [6] SummarizedExperiment_1.2.3
 [7] Biobase_2.32.0
 [8] Rsamtools_1.24.0
 [9] Biostrings_2.40.2
[10] XVector_0.12.1
[11] GenomicRanges_1.24.3
[12] GenomeInfoDb_1.8.7
[13] IRanges_2.6.1
[14] S4Vectors_0.10.3
[15] BiocGenerics_0.18.0
[16] BiocInstaller_1.22.3

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8        zlibbioc_1.18.0    BiocParallel_1.6.6 tools_3.3.1
 [5] DBI_0.5-1          digest_0.6.10      rtracklayer_1.32.2 bitops_1.0-6
 [9] RCurl_1.95-4.8     biomaRt_2.28.0     memoise_1.0.0      RSQLite_1.1
[13] XML_3.98-1.5
 

genomicalignments • 1.8k views
ADD COMMENT
0
Entering edit mode

The results if using R/devel & Bioconductor 3.4.

 

> sessionInfo()
R Under development (unstable) (2016-12-19 r71815)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] GenomicAlignments_1.10.0
 [2] Rsamtools_1.26.1
 [3] Biostrings_2.42.1
 [4] XVector_0.14.0
 [5] SummarizedExperiment_1.4.0
 [6] org.Mm.eg.db_3.4.0
 [7] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
 [8] GenomicFeatures_1.26.0
 [9] AnnotationDbi_1.37.0
[10] Biobase_2.35.0
[11] GenomicRanges_1.26.1
[12] GenomeInfoDb_1.10.1
[13] IRanges_2.8.1
[14] S4Vectors_0.12.1
[15] BiocGenerics_0.21.1
[16] BiocInstaller_1.24.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8        zlibbioc_1.20.0    BiocParallel_1.8.1 lattice_0.20-34
 [5] tools_3.4.0        grid_3.4.0         DBI_0.5-1          digest_0.6.10
 [9] Matrix_1.2-7.1     rtracklayer_1.34.1 bitops_1.0-6       RCurl_1.95-4.8
[13] biomaRt_2.30.0     memoise_1.0.0      RSQLite_1.1-1      compiler_3.4.0
[17] XML_3.98-1.5
>

ADD REPLY
1
Entering edit mode
@michael-lawrence-3846
Last seen 2.3 years ago
United States

The issue is that there is an inconsistency between how pintersect() and findOverlaps() treat deletions. pintersect() considers D to not be covered by the read alignment, so in this case, there is no overlap, and we can't represent empty alignments inside of a GAlignments. findOverlaps() works by first converting the GAlignments to a GRangesList via grglist(), which _keeps_ deletions by default (this is often useful when analyzing splicing). 

What is your use case? I guess an argument could be made for an option to pintersect() so that D regions are considered covered. But if you are OK with dropping deletions, then you can just convert with grglist() before calling findOverlaps(), passing drop.D.ranges=TRUE.

 

ADD COMMENT
0
Entering edit mode

Thank you, Michael, for the helpful details. Converting with grglist() using drop.D.ranges=TRUE before calling findoverlaps() would definitely work for my case. Wouldn't have known this before. 

 

Thanks,

CJ

 

ADD REPLY

Login before adding your answer.

Traffic: 795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6