ISO a solution to rank-order introns by transcript
1
0
Entering edit mode
mat149 ▴ 80
@mat149-11450
Last seen 5 days ago
United States

I need help with a method/operation to rank each intron by transcript -- based on the granges start/end. Specifically, I'm looking for a way to generate a unique identifier corresponding to: 1st intron, 2nd intron, ... last intron for each transcript, and place them in a granges mcols() Thanks

tx<-txdbmaker::makeTxDbFromGFF("genomic.gff",format="auto",chrominfo=s,dataSource="GCA_007735645.1",organism="Venturia.effusa",taxonomyId=50376,dbxrefTag="GENEID")

int<-GenomicFeatures::intronsByTranscript(tx,use.names=TRUE)
tail(int)

GRangesList object of length 6:
$`rna-gnl|PRJNA551043|FKW77_001900-T1_mrna`
GRanges object with 1 range and 0 metadata columns:
        seqnames        ranges strand
           <Rle>     <IRanges>  <Rle>
  [1] CP042204.1 460547-460594      -
  -------
  seqinfo: 21 sequences from ASM773564v1 genome

$`rna-gnl|PRJNA551043|FKW77_001938-T1_mrna`
GRanges object with 6 ranges and 0 metadata columns:
        seqnames        ranges strand
           <Rle>     <IRanges>  <Rle>
  [1] CP042204.1 466517-467022      -
  [2] CP042204.1 467072-467209      -
  [3] CP042204.1 467341-468214      -
  [4] CP042204.1 468703-468750      -
  [5] CP042204.1 469072-469125      -
  [6] CP042204.1 469246-469709      -
  -------
  seqinfo: 21 sequences from ASM773564v1 genome

$`rna-gnl|PRJNA551043|FKW77_001959-T1_mrna`
GRanges object with 2 ranges and 0 metadata columns:
        seqnames        ranges strand
           <Rle>     <IRanges>  <Rle>
  [1] CP042204.1 471047-471094      -
  [2] CP042204.1 471781-471826      -
  -------
  seqinfo: 21 sequences from ASM773564v1 genome

...
<3 more elements>



sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome.Vinaequalis.NCBI.ASM976944v1_1.0.0
 [2] BSgenome.Vnashicola.NCBI.ASM452265v2_1.0.0 
 [3] BSgenome.Veffusa.NCBI.ASM773564v1_1.0.0    
 [4] shape_1.4.6.1                              
 [5] RMariaDB_1.3.2                             
 [6] modeest_2.4.0                              
 [7] seqLogo_1.70.0                             
 [8] GenomicDistributions_1.12.0                
 [9] MotifDb_1.46.0                             
[10] rlist_0.4.6.2                              
[11] msaR_0.6.0                                 
[12] deepredeff_0.1.1                           
[13] seqinr_4.2-36                              
[14] ape_5.8                                    
[15] msa_1.36.1                                 
[16] OrganismDbi_1.46.0                         
[17] BSgenomeForge_1.4.0                        
[18] BSgenome_1.72.0                            
[19] BiocIO_1.14.0                              
[20] karyoploteR_1.30.0                         
[21] regioneR_1.36.0                            
[22] biovizBase_1.52.0                          
[23] Gviz_1.48.0                                
[24] AnnotationForge_1.46.0                     
[25] AnnotationHub_3.12.0                       
[26] BiocFileCache_2.12.0                       
[27] dbplyr_2.5.0                               
[28] UniProt.ws_2.44.0                          
[29] RSQLite_2.3.7                              
[30] gridExtra_2.3                              
[31] ggbio_1.52.0                               
[32] Rsubread_2.18.0                            
[33] ShortRead_1.62.0                           
[34] BiocParallel_1.38.0                        
[35] Rhisat2_1.20.0                             
[36] trackViewer_1.40.0                         
[37] rtracklayer_1.64.0                         
[38] R.utils_2.12.3                             
[39] R.oo_1.26.0                                
[40] R.methodsS3_1.8.2                          
[41] RColorBrewer_1.1-3                         
[42] scatterplot3d_0.3-44                       
[43] ggplot2_3.5.1                              
[44] gplots_3.1.3.1                             
[45] Cairo_1.6-2                                
[46] motifStack_1.48.0                          
[47] openxlsx_4.2.6.1                           
[48] GenomicFeatures_1.56.0                     
[49] AnnotationDbi_1.66.0                       
[50] GenomicAlignments_1.40.0                   
[51] Rsamtools_2.20.0                           
[52] Biostrings_2.72.1                          
[53] XVector_0.44.0                             
[54] SummarizedExperiment_1.34.0                
[55] Biobase_2.64.0                             
[56] MatrixGenerics_1.16.0                      
[57] matrixStats_1.3.0                          
[58] GenomicRanges_1.56.1                       
[59] GenomeInfoDb_1.40.1                        
[60] IRanges_2.38.1                             
[61] S4Vectors_0.42.1                           
[62] BiocGenerics_0.50.0                        
[63] devtools_2.4.5                             
[64] usethis_3.0.0                              

loaded via a namespace (and not attached):
  [1] fs_1.6.4                    ProtGenerics_1.36.0        
  [3] bitops_1.0-8                DirichletMultinomial_1.46.0
  [5] TFBSTools_1.42.0            httr_1.4.7                 
  [7] InteractionSet_1.32.0       profvis_0.3.8              
  [9] tools_4.4.1                 backports_1.5.0            
 [11] utf8_1.2.4                  R6_2.5.1                   
 [13] lazyeval_0.2.2              rhdf5filters_1.16.0        
 [15] urlchecker_1.0.1            withr_3.0.1                
 [17] prettyunits_1.2.0           GGally_2.2.1               
 [19] cli_3.6.3                   SGSeq_1.38.0               
 [21] grImport_0.9-7              readr_2.1.5                
 [23] txdbmaker_1.0.1             foreign_0.8-87             
 [25] dichromat_2.0-0.1           sessioninfo_1.2.2          
 [27] plotrix_3.8-4               rstudioapi_0.16.0          
 [29] generics_0.1.3              hwriter_1.3.2.1            
 [31] gtools_3.9.5                dplyr_1.1.4                
 [33] zip_2.3.1                   GO.db_3.19.1               
 [35] Matrix_1.7-0                interp_1.1-6               
 [37] fansi_1.0.6                 abind_1.4-5                
 [39] lifecycle_1.0.4             yaml_2.3.10                
 [41] rhdf5_2.48.0                SparseArray_1.4.8          
 [43] blob_1.2.4                  promises_1.3.0             
 [45] crayon_1.5.3                pwalign_1.0.0              
 [47] miniUI_0.1.1.1              lattice_0.22-6             
 [49] annotate_1.82.0             KEGGREST_1.44.1            
 [51] pillar_1.9.0                knitr_1.48                 
 [53] statip_0.2.3                rjson_0.2.21               
 [55] codetools_0.2-20            strawr_0.0.92              
 [57] glue_1.7.0                  data.table_1.15.4          
 [59] remotes_2.5.0               vctrs_0.6.5                
 [61] png_0.1-8                   gtable_0.3.5               
 [63] poweRlaw_0.80.0             cachem_1.1.0               
 [65] xfun_0.46                   S4Arrays_1.4.1             
 [67] mime_0.12                   pracma_2.4.4               
 [69] timeDate_4032.109           ellipsis_0.3.2             
 [71] nlme_3.1-165                bit64_4.0.5                
 [73] progress_1.2.3              filelock_1.0.3             
 [75] fBasics_4032.96             KernSmooth_2.23-24         
 [77] rpart_4.1.23                splitstackshape_1.4.8      
 [79] colorspace_2.1-1            DBI_1.2.3                  
 [81] Hmisc_5.1-3                 nnet_7.3-19                
 [83] ade4_1.7-22                 tidyselect_1.2.1           
 [85] timeSeries_4032.109         bit_4.0.5                  
 [87] compiler_4.4.1              curl_5.2.1                 
 [89] httr2_1.0.2                 graph_1.82.0               
 [91] rjsoncons_1.3.1             htmlTable_2.4.3            
 [93] bezier_1.1.2                xml2_1.3.6                 
 [95] DelayedArray_0.30.1         checkmate_2.3.2            
 [97] scales_1.3.0                caTools_1.18.2             
 [99] spatial_7.3-17              RBGL_1.80.0                
[101] rappdirs_0.3.3              stringr_1.5.1              
[103] digest_0.6.36               rmarkdown_2.28             
[105] htmltools_0.5.8.1           pkgconfig_2.0.3            
[107] jpeg_0.1-10                 base64enc_0.1-3            
[109] stabledist_0.7-2            fastmap_1.2.0              
[111] ensembldb_2.28.0            rlang_1.1.4                
[113] htmlwidgets_1.6.4           UCSC.utils_1.0.0           
[115] shiny_1.9.1                 jsonlite_1.8.8             
[117] VariantAnnotation_1.50.0    RCurl_1.98-1.16            
[119] magrittr_2.0.3              Formula_1.2-5              
[121] GenomeInfoDbData_1.2.12     Rhdf5lib_1.26.0            
[123] munsell_0.5.1               Rcpp_1.0.13                
[125] reticulate_1.38.0           bamsignals_1.36.0          
[127] stringi_1.8.4               stable_1.1.6               
[129] zlibbioc_1.50.0             MASS_7.3-61                
[131] plyr_1.8.9                  pkgbuild_1.4.4             
[133] ggstats_0.6.0               parallel_4.4.1             
[135] deldir_2.0-4                CNEr_1.40.0                
[137] hms_1.1.3                   igraph_2.0.3               
[139] RUnit_0.4.33                rmutil_1.1.10              
[141] reshape2_1.4.4              biomaRt_2.60.1             
[143] pkgload_1.4.0               TFMPvalue_0.0.9            
[145] BiocVersion_3.19.1          XML_3.99-0.17              
[147] evaluate_0.24.0             latticeExtra_0.6-30        
[149] BiocManager_1.30.23         tzdb_0.4.0                 
[151] httpuv_1.6.15               tidyr_1.3.1                
[153] purrr_1.0.2                 clue_0.3-65                
[155] BiocBaseUtils_1.6.0         xtable_1.8-4               
[157] restfulr_0.0.15             AnnotationFilter_1.28.0    
[159] later_1.3.2                 tibble_3.2.1               
[161] memoise_2.0.1               cluster_2.1.6
GenomicRanges GenomicFeatures • 193 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

Something like

> z <- intronsByTranscript(tx, use.names = TRUE)
> z <- z[lengths(z) > 0L]
> znam <- sapply(strsplit(names(z), "\\|"), "[", 3)
> zz <- unlist(z)
> zz$intron <- paste(rep(znam, lengths(z)), sapply(lengths(z), seq_len), sep = "_")
> head(zz$intron, 30)
 [1] "FKW77_000178-T1_mrna_1"  
 [2] "FKW77_000228-T1_mrna_1"  
 [3] "FKW77_000243-T1_mrna_1"  
 [4] "FKW77_000249-T1_mrna_1:3"
 [5] "FKW77_000249-T1_mrna_1:2"
 [6] "FKW77_000249-T1_mrna_1:3"
 [7] "FKW77_000264-T1_mrna_1:2"
 [8] "FKW77_000264-T1_mrna_1:4"
 [9] "FKW77_000300-T1_mrna_1"  
[10] "FKW77_000300-T1_mrna_1:2"
[11] "FKW77_000300-T1_mrna_1:2"
[12] "FKW77_000331-T1_mrna_1:2"
[13] "FKW77_000331-T1_mrna_1"  
[14] "FKW77_000372-T1_mrna_1:4"
[15] "FKW77_000372-T1_mrna_1:3"
[16] "FKW77_000372-T1_mrna_1"  
[17] "FKW77_000372-T1_mrna_1:2"
[18] "FKW77_000383-T1_mrna_1"  
[19] "FKW77_000400-T1_mrna_1:5"
[20] "FKW77_000400-T1_mrna_1:5"
[21] "FKW77_000451-T1_mrna_1:2"
[22] "FKW77_000451-T1_mrna_1:2"
[23] "FKW77_000460-T1_mrna_1:7"
[24] "FKW77_000460-T1_mrna_1:8"
[25] "FKW77_000469-T1_mrna_1"  
[26] "FKW77_000584-T1_mrna_1:2"
[27] "FKW77_000584-T1_mrna_1:2"
[28] "FKW77_000584-T1_mrna_1:2"
[29] "FKW77_000584-T1_mrna_1:2"
[30] "FKW77_000602-T1_mrna_1"
1
Entering edit mode

There's an error in the last line. Note that sapply(lengths(z), seq_len) will return a list because the lengths vary. We need a vector, so have to convert that list to a vector. There are (at least) two ways to do that, using either unlist or (my preference) do.call, which is (probably) more likely to do what you expect.

> zz$intron <- paste(rep(znam, lengths(z)), do.call(c, sapply(lengths(z), seq_len)), sep = "_")
> head(zz$intron, 30)
 [1] "FKW77_000178-T1_mrna_1"
 [2] "FKW77_000228-T1_mrna_1"
 [3] "FKW77_000243-T1_mrna_1"
 [4] "FKW77_000249-T1_mrna_1"
 [5] "FKW77_000249-T1_mrna_2"
 [6] "FKW77_000249-T1_mrna_3"
 [7] "FKW77_000264-T1_mrna_1"
 [8] "FKW77_000264-T1_mrna_2"
 [9] "FKW77_000300-T1_mrna_1"
[10] "FKW77_000300-T1_mrna_2"
[11] "FKW77_000300-T1_mrna_3"
[12] "FKW77_000331-T1_mrna_1"
[13] "FKW77_000331-T1_mrna_2"
[14] "FKW77_000372-T1_mrna_1"
[15] "FKW77_000372-T1_mrna_2"
[16] "FKW77_000372-T1_mrna_3"
[17] "FKW77_000372-T1_mrna_4"
[18] "FKW77_000383-T1_mrna_1"
[19] "FKW77_000400-T1_mrna_1"
[20] "FKW77_000400-T1_mrna_2"
[21] "FKW77_000451-T1_mrna_1"
[22] "FKW77_000451-T1_mrna_2"
[23] "FKW77_000460-T1_mrna_1"
[24] "FKW77_000460-T1_mrna_2"
[25] "FKW77_000469-T1_mrna_1"
[26] "FKW77_000584-T1_mrna_1"
[27] "FKW77_000584-T1_mrna_2"
[28] "FKW77_000584-T1_mrna_3"
[29] "FKW77_000584-T1_mrna_4"
[30] "FKW77_000602-T1_mrna_1"

Login before adding your answer.

Traffic: 521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6