Entering edit mode
I need help with a method/operation to rank each intron by transcript -- based on the granges start/end. Specifically, I'm looking for a way to generate a unique identifier corresponding to: 1st intron, 2nd intron, ... last intron for each transcript, and place them in a granges mcols() Thanks
tx<-txdbmaker::makeTxDbFromGFF("genomic.gff",format="auto",chrominfo=s,dataSource="GCA_007735645.1",organism="Venturia.effusa",taxonomyId=50376,dbxrefTag="GENEID")
int<-GenomicFeatures::intronsByTranscript(tx,use.names=TRUE)
tail(int)
GRangesList object of length 6:
$`rna-gnl|PRJNA551043|FKW77_001900-T1_mrna`
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] CP042204.1 460547-460594 -
-------
seqinfo: 21 sequences from ASM773564v1 genome
$`rna-gnl|PRJNA551043|FKW77_001938-T1_mrna`
GRanges object with 6 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] CP042204.1 466517-467022 -
[2] CP042204.1 467072-467209 -
[3] CP042204.1 467341-468214 -
[4] CP042204.1 468703-468750 -
[5] CP042204.1 469072-469125 -
[6] CP042204.1 469246-469709 -
-------
seqinfo: 21 sequences from ASM773564v1 genome
$`rna-gnl|PRJNA551043|FKW77_001959-T1_mrna`
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] CP042204.1 471047-471094 -
[2] CP042204.1 471781-471826 -
-------
seqinfo: 21 sequences from ASM773564v1 genome
...
<3 more elements>
sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] BSgenome.Vinaequalis.NCBI.ASM976944v1_1.0.0
[2] BSgenome.Vnashicola.NCBI.ASM452265v2_1.0.0
[3] BSgenome.Veffusa.NCBI.ASM773564v1_1.0.0
[4] shape_1.4.6.1
[5] RMariaDB_1.3.2
[6] modeest_2.4.0
[7] seqLogo_1.70.0
[8] GenomicDistributions_1.12.0
[9] MotifDb_1.46.0
[10] rlist_0.4.6.2
[11] msaR_0.6.0
[12] deepredeff_0.1.1
[13] seqinr_4.2-36
[14] ape_5.8
[15] msa_1.36.1
[16] OrganismDbi_1.46.0
[17] BSgenomeForge_1.4.0
[18] BSgenome_1.72.0
[19] BiocIO_1.14.0
[20] karyoploteR_1.30.0
[21] regioneR_1.36.0
[22] biovizBase_1.52.0
[23] Gviz_1.48.0
[24] AnnotationForge_1.46.0
[25] AnnotationHub_3.12.0
[26] BiocFileCache_2.12.0
[27] dbplyr_2.5.0
[28] UniProt.ws_2.44.0
[29] RSQLite_2.3.7
[30] gridExtra_2.3
[31] ggbio_1.52.0
[32] Rsubread_2.18.0
[33] ShortRead_1.62.0
[34] BiocParallel_1.38.0
[35] Rhisat2_1.20.0
[36] trackViewer_1.40.0
[37] rtracklayer_1.64.0
[38] R.utils_2.12.3
[39] R.oo_1.26.0
[40] R.methodsS3_1.8.2
[41] RColorBrewer_1.1-3
[42] scatterplot3d_0.3-44
[43] ggplot2_3.5.1
[44] gplots_3.1.3.1
[45] Cairo_1.6-2
[46] motifStack_1.48.0
[47] openxlsx_4.2.6.1
[48] GenomicFeatures_1.56.0
[49] AnnotationDbi_1.66.0
[50] GenomicAlignments_1.40.0
[51] Rsamtools_2.20.0
[52] Biostrings_2.72.1
[53] XVector_0.44.0
[54] SummarizedExperiment_1.34.0
[55] Biobase_2.64.0
[56] MatrixGenerics_1.16.0
[57] matrixStats_1.3.0
[58] GenomicRanges_1.56.1
[59] GenomeInfoDb_1.40.1
[60] IRanges_2.38.1
[61] S4Vectors_0.42.1
[62] BiocGenerics_0.50.0
[63] devtools_2.4.5
[64] usethis_3.0.0
loaded via a namespace (and not attached):
[1] fs_1.6.4 ProtGenerics_1.36.0
[3] bitops_1.0-8 DirichletMultinomial_1.46.0
[5] TFBSTools_1.42.0 httr_1.4.7
[7] InteractionSet_1.32.0 profvis_0.3.8
[9] tools_4.4.1 backports_1.5.0
[11] utf8_1.2.4 R6_2.5.1
[13] lazyeval_0.2.2 rhdf5filters_1.16.0
[15] urlchecker_1.0.1 withr_3.0.1
[17] prettyunits_1.2.0 GGally_2.2.1
[19] cli_3.6.3 SGSeq_1.38.0
[21] grImport_0.9-7 readr_2.1.5
[23] txdbmaker_1.0.1 foreign_0.8-87
[25] dichromat_2.0-0.1 sessioninfo_1.2.2
[27] plotrix_3.8-4 rstudioapi_0.16.0
[29] generics_0.1.3 hwriter_1.3.2.1
[31] gtools_3.9.5 dplyr_1.1.4
[33] zip_2.3.1 GO.db_3.19.1
[35] Matrix_1.7-0 interp_1.1-6
[37] fansi_1.0.6 abind_1.4-5
[39] lifecycle_1.0.4 yaml_2.3.10
[41] rhdf5_2.48.0 SparseArray_1.4.8
[43] blob_1.2.4 promises_1.3.0
[45] crayon_1.5.3 pwalign_1.0.0
[47] miniUI_0.1.1.1 lattice_0.22-6
[49] annotate_1.82.0 KEGGREST_1.44.1
[51] pillar_1.9.0 knitr_1.48
[53] statip_0.2.3 rjson_0.2.21
[55] codetools_0.2-20 strawr_0.0.92
[57] glue_1.7.0 data.table_1.15.4
[59] remotes_2.5.0 vctrs_0.6.5
[61] png_0.1-8 gtable_0.3.5
[63] poweRlaw_0.80.0 cachem_1.1.0
[65] xfun_0.46 S4Arrays_1.4.1
[67] mime_0.12 pracma_2.4.4
[69] timeDate_4032.109 ellipsis_0.3.2
[71] nlme_3.1-165 bit64_4.0.5
[73] progress_1.2.3 filelock_1.0.3
[75] fBasics_4032.96 KernSmooth_2.23-24
[77] rpart_4.1.23 splitstackshape_1.4.8
[79] colorspace_2.1-1 DBI_1.2.3
[81] Hmisc_5.1-3 nnet_7.3-19
[83] ade4_1.7-22 tidyselect_1.2.1
[85] timeSeries_4032.109 bit_4.0.5
[87] compiler_4.4.1 curl_5.2.1
[89] httr2_1.0.2 graph_1.82.0
[91] rjsoncons_1.3.1 htmlTable_2.4.3
[93] bezier_1.1.2 xml2_1.3.6
[95] DelayedArray_0.30.1 checkmate_2.3.2
[97] scales_1.3.0 caTools_1.18.2
[99] spatial_7.3-17 RBGL_1.80.0
[101] rappdirs_0.3.3 stringr_1.5.1
[103] digest_0.6.36 rmarkdown_2.28
[105] htmltools_0.5.8.1 pkgconfig_2.0.3
[107] jpeg_0.1-10 base64enc_0.1-3
[109] stabledist_0.7-2 fastmap_1.2.0
[111] ensembldb_2.28.0 rlang_1.1.4
[113] htmlwidgets_1.6.4 UCSC.utils_1.0.0
[115] shiny_1.9.1 jsonlite_1.8.8
[117] VariantAnnotation_1.50.0 RCurl_1.98-1.16
[119] magrittr_2.0.3 Formula_1.2-5
[121] GenomeInfoDbData_1.2.12 Rhdf5lib_1.26.0
[123] munsell_0.5.1 Rcpp_1.0.13
[125] reticulate_1.38.0 bamsignals_1.36.0
[127] stringi_1.8.4 stable_1.1.6
[129] zlibbioc_1.50.0 MASS_7.3-61
[131] plyr_1.8.9 pkgbuild_1.4.4
[133] ggstats_0.6.0 parallel_4.4.1
[135] deldir_2.0-4 CNEr_1.40.0
[137] hms_1.1.3 igraph_2.0.3
[139] RUnit_0.4.33 rmutil_1.1.10
[141] reshape2_1.4.4 biomaRt_2.60.1
[143] pkgload_1.4.0 TFMPvalue_0.0.9
[145] BiocVersion_3.19.1 XML_3.99-0.17
[147] evaluate_0.24.0 latticeExtra_0.6-30
[149] BiocManager_1.30.23 tzdb_0.4.0
[151] httpuv_1.6.15 tidyr_1.3.1
[153] purrr_1.0.2 clue_0.3-65
[155] BiocBaseUtils_1.6.0 xtable_1.8-4
[157] restfulr_0.0.15 AnnotationFilter_1.28.0
[159] later_1.3.2 tibble_3.2.1
[161] memoise_2.0.1 cluster_2.1.6

There's an error in the last line. Note that
sapply(lengths(z), seq_len)will return alistbecause the lengths vary. We need a vector, so have to convert that list to a vector. There are (at least) two ways to do that, using eitherunlistor (my preference)do.call, which is (probably) more likely to do what you expect.Hey, In revisiting this question, the following code works for the '+' strand. I am having issues with demarcating intron rank by transcript on '-' strand introns. Negative strand intron ranks by transcript are in the reverse orientation....(e.g. the output is 1-2-3-4-5 when it should be 5-4-3-2-1) as granges 'end' and 'start' should be inverted for the negative strand intron ranking convention. Is there a subsetting operation that can address this?