Hello,
Is there a more straightforward way of generating a GRanges o GRangesList object with the cigar ranges of genomic alignments in reference space?. I am using cigarRangesAlongReferenceSpace
, but it returns an IRangesList. Would it make sense that cigarRangesAlongReferenceSpace
return a GRangesList instead (because it is in reference space)? For example, junctions
return a GRangesList object.
So far, what I am doing is convert the IRangesList to a GRangesList, but it is not straightforward. Is there another way?
Thanks
library(GenomicAlignments)
# Dummy GAlignments
gal <- GAlignments(
seqnames=c('chr1','chr2'),
pos=c(1000,2000),
cigar = c('30M2000N12M','15M40N10M')
)
gal
GAlignments object with 2 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width njunc
<Rle> <Rle> <character> <integer> <integer> <integer> <integer> <integer>
[1] chr1 * 30M2000N12M 42 1000 3041 2042 1
[2] chr2 * 15M40N10M 25 2000 2064 65 1
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
# Extract the matches
cirs <- cigarRangesAlongReferenceSpace(cigar(gal),pos = start(gal),ops = 'M',drop.empty.ranges = T)
cirs
IRangesList object of length 2:
[[1]]
IRanges object with 2 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 1000 1029 30
[2] 3030 3041 12
[[2]]
IRanges object with 2 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 2000 2014 15
[2] 2055 2064 10
# Convert to GRanges
ske <- cirs@partitioning # Or PartitioningByEnd(cirs)
lens <- lengths(cirs)
cgrs <- GRanges(
seqnames=rep(seqnames(gal),lens),
ranges=unlist(cirs),
strand=rep(strand(gal),lens),
mcols(gal)[rep(1:length(gal),lens),,drop=F])
cgrs <- relist(cgrs,ske)
cgrs
GRangesList object of length 2:
[[1]]
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 1000-1029 *
[2] chr1 3030-3041 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
[[2]]
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 2000-2014 *
[2] chr2 2055-2064 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
sessionInfo( )
R version 4.4.3 (2025-02-28)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 12 (bookworm)
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Madrid
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicAlignments_1.40.0 Rsamtools_2.20.0 Biostrings_2.72.0 XVector_0.44.0
[5] SummarizedExperiment_1.34.0 Biobase_2.64.0 MatrixGenerics_1.16.0 matrixStats_1.3.0
[9] GenomicRanges_1.56.0 GenomeInfoDb_1.40.1 IRanges_2.38.0 S4Vectors_0.42.0
[13] BiocGenerics_0.50.0
loaded via a namespace (and not attached):
[1] crayon_1.5.2 httr_1.4.7 UCSC.utils_1.0.0 jsonlite_1.8.8 DelayedArray_0.30.1
[6] grid_4.4.3 abind_1.4-5 bitops_1.0-7 compiler_4.4.3 codetools_0.2-20
[11] BiocParallel_1.38.0 rstudioapi_0.16.0 lattice_0.22-6 R6_2.5.1 SparseArray_1.4.8
[16] parallel_4.4.3 GenomeInfoDbData_1.2.12 Matrix_1.7-0 tools_4.4.3 zlibbioc_1.50.0
[21] S4Arrays_1.4.1
Wow, so easy! Thanks for your quick reply.