GAlignments Cigar to GRanges
1
1
Entering edit mode
@9ff02a84
Last seen 3 months ago
Spain

Hello,

Is there a more straightforward way of generating a GRanges o GRangesList object with the cigar ranges of genomic alignments in reference space?. I am using cigarRangesAlongReferenceSpace, but it returns an IRangesList. Would it make sense that cigarRangesAlongReferenceSpace return a GRangesList instead (because it is in reference space)? For example, junctions return a GRangesList object.

So far, what I am doing is convert the IRangesList to a GRangesList, but it is not straightforward. Is there another way?

Thanks


  library(GenomicAlignments)

# Dummy GAlignments
  gal <- GAlignments(
    seqnames=c('chr1','chr2'),
    pos=c(1000,2000),
    cigar = c('30M2000N12M','15M40N10M')
  )
  gal

GAlignments object with 2 alignments and 0 metadata columns:
      seqnames strand       cigar    qwidth     start       end     width     njunc
         <Rle>  <Rle> <character> <integer> <integer> <integer> <integer> <integer>
  [1]     chr1      * 30M2000N12M        42      1000      3041      2042         1
  [2]     chr2      *   15M40N10M        25      2000      2064        65         1
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths


# Extract the matches
  cirs <- cigarRangesAlongReferenceSpace(cigar(gal),pos = start(gal),ops = 'M',drop.empty.ranges = T)
  cirs

IRangesList object of length 2:
[[1]]
IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]      1000      1029        30
  [2]      3030      3041        12

[[2]]
IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]      2000      2014        15
  [2]      2055      2064        10

# Convert to GRanges
  ske <- cirs@partitioning # Or PartitioningByEnd(cirs)
  lens <- lengths(cirs)
  cgrs <- GRanges(
    seqnames=rep(seqnames(gal),lens),
    ranges=unlist(cirs),
    strand=rep(strand(gal),lens),
    mcols(gal)[rep(1:length(gal),lens),,drop=F])
  cgrs <- relist(cgrs,ske)
  cgrs

GRangesList object of length 2:
[[1]]
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1 1000-1029      *
  [2]     chr1 3030-3041      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

[[2]]
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr2 2000-2014      *
  [2]     chr2 2055-2064      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

sessionInfo( )

R version 4.4.3 (2025-02-28)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 12 (bookworm)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Madrid
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicAlignments_1.40.0    Rsamtools_2.20.0            Biostrings_2.72.0           XVector_0.44.0             
 [5] SummarizedExperiment_1.34.0 Biobase_2.64.0              MatrixGenerics_1.16.0       matrixStats_1.3.0          
 [9] GenomicRanges_1.56.0        GenomeInfoDb_1.40.1         IRanges_2.38.0              S4Vectors_0.42.0           
[13] BiocGenerics_0.50.0        

loaded via a namespace (and not attached):
 [1] crayon_1.5.2            httr_1.4.7              UCSC.utils_1.0.0        jsonlite_1.8.8          DelayedArray_0.30.1    
 [6] grid_4.4.3              abind_1.4-5             bitops_1.0-7            compiler_4.4.3          codetools_0.2-20       
[11] BiocParallel_1.38.0     rstudioapi_0.16.0       lattice_0.22-6          R6_2.5.1                SparseArray_1.4.8      
[16] parallel_4.4.3          GenomeInfoDbData_1.2.12 Matrix_1.7-0            tools_4.4.3             zlibbioc_1.50.0        
[21] S4Arrays_1.4.1
GenomicRanges cigarRangesAlongReferenceSpace GenomicAlignments Cigar • 1.2k views
ADD COMMENT
3
Entering edit mode
@herve-pages-1542
Last seen 11 hours ago
Seattle, WA, United States
> as(gal, "GRangesList")
GRangesList object of length 2:
[[1]]
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1 1000-1029      *
  [2]     chr1 3030-3041      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

[[2]]
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr2 2000-2014      *
  [2]     chr2 2055-2064      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

This coercion uses cigarRangesAlongReferenceSpace() behind the scene.

You rarely need to use cigarRangesAlongReferenceSpace() or any of the other low-level utilities documented in ?'cigar-utils' _directly_. These are the low-level workhorses behind higher level functionalities like as(<GAlignments>, "GRangesList"), as(<GAlignments>, "GRanges"), grglist(<GAlignments>), rglist(<GAlignments>), granges(<GAlignments>), ranges(<GAlignments>), etc...

ADD COMMENT
0
Entering edit mode

Wow, so easy! Thanks for your quick reply.

ADD REPLY

Login before adding your answer.

Traffic: 724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6