ISO a solution to rank-order introns by transcript
I need help with a method/operation to rank each intron by transcript -- based on the granges start/end. Specifically, I'm looking for a way to generate a unique identifier corresponding to: 1st intron, 2nd intron, ... last intron for each transcript, and place them in a granges mcols() Thanks



GRangesList object of length 6:
GRanges object with 1 range and 0 metadata columns:
        seqnames        ranges strand
           <Rle>     <IRanges>  <Rle>
  [1] CP042204.1 460547-460594      -
  seqinfo: 21 sequences from ASM773564v1 genome

GRanges object with 6 ranges and 0 metadata columns:
        seqnames        ranges strand
           <Rle>     <IRanges>  <Rle>
  [1] CP042204.1 466517-467022      -
  [2] CP042204.1 467072-467209      -
  [3] CP042204.1 467341-468214      -
  [4] CP042204.1 468703-468750      -
  [5] CP042204.1 469072-469125      -
  [6] CP042204.1 469246-469709      -
  seqinfo: 21 sequences from ASM773564v1 genome

GRanges object with 2 ranges and 0 metadata columns:
        seqnames        ranges strand
           <Rle>     <IRanges>  <Rle>
  [1] CP042204.1 471047-471094      -
  [2] CP042204.1 471781-471826      -
  seqinfo: 21 sequences from ASM773564v1 genome

<3 more elements>

Something like

> z <- intronsByTranscript(tx, use.names = TRUE)
> z <- z[lengths(z) > 0L]
> znam <- sapply(strsplit(names(z), "\\|"), "[", 3)
> zz <- unlist(z)
> zz$intron <- paste(rep(znam, lengths(z)), sapply(lengths(z), seq_len), sep = "_")
> head(zz$intron, 30)
 [1] "FKW77_000178-T1_mrna_1"  
 [2] "FKW77_000228-T1_mrna_1"  
 [3] "FKW77_000243-T1_mrna_1"  
 [4] "FKW77_000249-T1_mrna_1:3"
 [5] "FKW77_000249-T1_mrna_1:2"
 [6] "FKW77_000249-T1_mrna_1:3"
 [7] "FKW77_000264-T1_mrna_1:2"
 [8] "FKW77_000264-T1_mrna_1:4"
 [9] "FKW77_000300-T1_mrna_1"  
[10] "FKW77_000300-T1_mrna_1:2"
[11] "FKW77_000300-T1_mrna_1:2"
[12] "FKW77_000331-T1_mrna_1:2"
[13] "FKW77_000331-T1_mrna_1"  
[14] "FKW77_000372-T1_mrna_1:4"
[15] "FKW77_000372-T1_mrna_1:3"
[16] "FKW77_000372-T1_mrna_1"  
[17] "FKW77_000372-T1_mrna_1:2"
[18] "FKW77_000383-T1_mrna_1"  
[19] "FKW77_000400-T1_mrna_1:5"
[20] "FKW77_000400-T1_mrna_1:5"
[21] "FKW77_000451-T1_mrna_1:2"
[22] "FKW77_000451-T1_mrna_1:2"
[23] "FKW77_000460-T1_mrna_1:7"
[24] "FKW77_000460-T1_mrna_1:8"
[25] "FKW77_000469-T1_mrna_1"  
[26] "FKW77_000584-T1_mrna_1:2"
[27] "FKW77_000584-T1_mrna_1:2"
[28] "FKW77_000584-T1_mrna_1:2"
[29] "FKW77_000584-T1_mrna_1:2"
[30] "FKW77_000602-T1_mrna_1"
There's an error in the last line. Note that sapply(lengths(z), seq_len) will return a list because the lengths vary. We need a vector, so have to convert that list to a vector. There are (at least) two ways to do that, using either unlist or (my preference), which is (probably) more likely to do what you expect.

> zz$intron <- paste(rep(znam, lengths(z)),, sapply(lengths(z), seq_len)), sep = "_")
> head(zz$intron, 30)
 [1] "FKW77_000178-T1_mrna_1"
 [2] "FKW77_000228-T1_mrna_1"
 [3] "FKW77_000243-T1_mrna_1"
 [4] "FKW77_000249-T1_mrna_1"
 [5] "FKW77_000249-T1_mrna_2"
 [6] "FKW77_000249-T1_mrna_3"
 [7] "FKW77_000264-T1_mrna_1"
 [8] "FKW77_000264-T1_mrna_2"
 [9] "FKW77_000300-T1_mrna_1"
[10] "FKW77_000300-T1_mrna_2"
[11] "FKW77_000300-T1_mrna_3"
[12] "FKW77_000331-T1_mrna_1"
[13] "FKW77_000331-T1_mrna_2"
[14] "FKW77_000372-T1_mrna_1"
[15] "FKW77_000372-T1_mrna_2"
[16] "FKW77_000372-T1_mrna_3"
[17] "FKW77_000372-T1_mrna_4"
[18] "FKW77_000383-T1_mrna_1"
[19] "FKW77_000400-T1_mrna_1"
[20] "FKW77_000400-T1_mrna_2"
[21] "FKW77_000451-T1_mrna_1"
[22] "FKW77_000451-T1_mrna_2"
[23] "FKW77_000460-T1_mrna_1"
[24] "FKW77_000460-T1_mrna_2"
[25] "FKW77_000469-T1_mrna_1"
[26] "FKW77_000584-T1_mrna_1"
[27] "FKW77_000584-T1_mrna_2"
[28] "FKW77_000584-T1_mrna_3"
[29] "FKW77_000584-T1_mrna_4"
[30] "FKW77_000602-T1_mrna_1"
Hey, In revisiting this question, the following code works for the '+' strand. I am having issues with demarcating intron rank by transcript on '-' strand introns. Negative strand intron ranks by transcript are in the reverse orientation....(e.g. the output is 1-2-3-4-5 when it should be 5-4-3-2-1) as granges 'end' and 'start' should be inverted for the negative strand intron ranking convention. Is there a subsetting operation that can address this?

z <- intronsByTranscript(tx, use.names = TRUE)
z <- z[lengths(z) > 0L]
znam <- sapply(strsplit(names(z), "\\|"), "[", 3)
zz <- unlist(z)
ind[,7]<-unlist(sapply(lengths(z), seq_len))

            seqnames  start    end width strand          GID IntronRankByTranscript
VEintron1 CP042185.1 102664 102719    56      + FKW77_000178                      1
VEintron2 CP042185.1 114417 114472    56      + FKW77_000228                      1
VEintron3 CP042185.1 117912 117961    50      + FKW77_000243                      1
VEintron4 CP042185.1 118970 119025    56      +         CYT1                      1
VEintron5 CP042185.1 119184 119241    58      +         CYT1                      2
VEintron6 CP042185.1 119905 119957    53      +         CYT1                      3

                seqnames  start    end width strand          GID IntronRankByTranscript
VEintron20729 CP042204.1 501893 502450   558      - FKW77_002098                      1
VEintron20730 CP042204.1 502552 502598    47      - FKW77_002098                      2
VEintron20731 CP042204.1 502620 502666    47      - FKW77_002098                      3
VEintron20732 CP042204.1 502765 502810    46      - FKW77_002098                      4
VEintron20733 CP042204.1 502858 503688   831      - FKW77_002098                      5
VEintron20734 CP042204.1 512952 513671   720      - FKW77_002145                      1

