Question: GenomicAlignment::pintersect() of two overlapping ranges - CIGAR is empty after narrowing
0
gravatar for Chao-Jen Wong
2.3 years ago by
USA/Seattle/Fred Hutchinson Cancer Research Center
Chao-Jen Wong30 wrote:

Hi guys, 

My task is to find the width of intersect areas of GAlignments reads and GRanges objects. First, I have made sure that there are overlapping for at least 1L by using findOverlaps() and then use pintersect() and width() to find the width of the overlapping area. For my case, I have a spliced read as a GAlignments object and subject as a GRanges. I know that they overlap at one base: chr9:53621858. So I expect to get a return GAlignments object with 1 width:

My query looks like

> query
GAlignments object with 1 alignment and 1 metadata column:
      seqnames strand        cigar    qwidth     start       end     width
         <Rle>  <Rle>  <character> <integer> <integer> <integer> <integer>
  [1]     chr9      - 57M574N1D43M       100  53621227  53621901       675
          njunc |          XS
      <integer> | <character>
  [1]         1 |           -
  -------
  seqinfo: 22 sequences from an unspecified genome

The GRangesList object of this GAlignments read looks like:

> as(query, "GRangesList")
GRangesList object of length 1:
[[1]]
GRanges object with 2 ranges and 0 metadata columns:
      seqnames               ranges strand
         <Rle>            <IRanges>  <Rle>
  [1]     chr9 [53621227, 53621283]      -
  [2]     chr9 [53621858, 53621901]      -

 

My subject looks like

> subject
GRanges object with 1 range and 0 metadata columns:
             seqnames               ranges strand
                <Rle>            <IRanges>  <Rle>
  peak_47658     chr9 [53621329, 53621858]      *

I make sure they overlap with 1L

> findOverlaps(query, subject, ignore.strand=FALSE)
Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  -------
  queryLength: 1 / subjectLength: 1

 

But when I apply pinterest(), I got an error message:

> pintersect(query, subject)
Error in .Call2("cigar_narrow", cigar, width(threeranges$left), width(threeranges$right),  :
  in 'cigar[1]': CIGAR is empty after narrowing

If I convert the query read to a GRagnesList and can get more reasonable answer:

> width(pintersect(as(query, "GRangesList"), subject[47658]))
IntegerList of length 1
[[1]] 0 1

 

I apologize for the tedious work here. I try to write a robust function to do the task and, among several bam files,  this and only this particular read gives the trouble. I don't know if it is due to the deletion in the CIGAR and I don't know how the narrowing CIGAR thing works. If I want to bypass this error, is there any tip you guys can provide? 

 

Thanks a lot,

Chao-Jen

 

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] org.Mm.eg.db_3.3.0
 [2] TxDb.Mmusculus.UCSC.mm10.knownGene_3.2.2
 [3] GenomicFeatures_1.24.5
 [4] AnnotationDbi_1.34.4
 [5] GenomicAlignments_1.8.4
 [6] SummarizedExperiment_1.2.3
 [7] Biobase_2.32.0
 [8] Rsamtools_1.24.0
 [9] Biostrings_2.40.2
[10] XVector_0.12.1
[11] GenomicRanges_1.24.3
[12] GenomeInfoDb_1.8.7
[13] IRanges_2.6.1
[14] S4Vectors_0.10.3
[15] BiocGenerics_0.18.0
[16] BiocInstaller_1.22.3

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8        zlibbioc_1.18.0    BiocParallel_1.6.6 tools_3.3.1
 [5] DBI_0.5-1          digest_0.6.10      rtracklayer_1.32.2 bitops_1.0-6
 [9] RCurl_1.95-4.8     biomaRt_2.28.0     memoise_1.0.0      RSQLite_1.1
[13] XML_3.98-1.5
 

genomicalignments • 402 views
ADD COMMENTlink modified 2.3 years ago by Michael Lawrence10k • written 2.3 years ago by Chao-Jen Wong30

The results if using R/devel & Bioconductor 3.4.

 

> sessionInfo()
R Under development (unstable) (2016-12-19 r71815)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] GenomicAlignments_1.10.0
 [2] Rsamtools_1.26.1
 [3] Biostrings_2.42.1
 [4] XVector_0.14.0
 [5] SummarizedExperiment_1.4.0
 [6] org.Mm.eg.db_3.4.0
 [7] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
 [8] GenomicFeatures_1.26.0
 [9] AnnotationDbi_1.37.0
[10] Biobase_2.35.0
[11] GenomicRanges_1.26.1
[12] GenomeInfoDb_1.10.1
[13] IRanges_2.8.1
[14] S4Vectors_0.12.1
[15] BiocGenerics_0.21.1
[16] BiocInstaller_1.24.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8        zlibbioc_1.20.0    BiocParallel_1.8.1 lattice_0.20-34
 [5] tools_3.4.0        grid_3.4.0         DBI_0.5-1          digest_0.6.10
 [9] Matrix_1.2-7.1     rtracklayer_1.34.1 bitops_1.0-6       RCurl_1.95-4.8
[13] biomaRt_2.30.0     memoise_1.0.0      RSQLite_1.1-1      compiler_3.4.0
[17] XML_3.98-1.5
>

ADD REPLYlink written 2.3 years ago by Chao-Jen Wong30
Answer: GenomicAlignment::pintersect() of two overlapping ranges - CIGAR is empty after
1
gravatar for Michael Lawrence
2.3 years ago by
United States
Michael Lawrence10k wrote:

The issue is that there is an inconsistency between how pintersect() and findOverlaps() treat deletions. pintersect() considers D to not be covered by the read alignment, so in this case, there is no overlap, and we can't represent empty alignments inside of a GAlignments. findOverlaps() works by first converting the GAlignments to a GRangesList via grglist(), which _keeps_ deletions by default (this is often useful when analyzing splicing). 

What is your use case? I guess an argument could be made for an option to pintersect() so that D regions are considered covered. But if you are OK with dropping deletions, then you can just convert with grglist() before calling findOverlaps(), passing drop.D.ranges=TRUE.

 

ADD COMMENTlink written 2.3 years ago by Michael Lawrence10k

Thank you, Michael, for the helpful details. Converting with grglist() using drop.D.ranges=TRUE before calling findoverlaps() would definitely work for my case. Wouldn't have known this before. 

 

Thanks,

CJ

 

ADD REPLYlink written 2.3 years ago by Chao-Jen Wong30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 146 users visited in the last hour