I'm trying to create a GenomicRanges list with mature miRNA coordinates relative to their hairpin precursor sequence. Essentially each hairpin will be its own chromosome with 1-2 mature miRNAs per "hairpin chromosome". I downloaded the human miRNA gff file from miRbase and did the following:
gff <- import('hsa.gff3')
hairpin <- gff[mcols(gff)$type == 'miRNA_primary_transcript',]
names(hairpin) <- mcols(hairpin)$ID
mature <- gff[mcols(gff)$type == 'miRNA',]
names(mature) <- mcols(mature)$Derives_from
mature_relative <- shift(mature, -start(hairpin[names(mature),]))
names(mature_relative) <- mcols(mature)$ID
This is what I end up with:
GRanges object with 2813 ranges and 8 metadata columns:
                 seqnames    ranges strand   |   source     type     score     phase             ID           Alias            Name Derives_from
                    <Rle> <IRanges>  <Rle>   | <factor> <factor> <numeric> <integer>    <character> <CharacterList>     <character>  <character>
    MIMAT0027618     chr1  [40, 62]      -   |     <NA>    miRNA      <NA>      <NA>   MIMAT0027618    MIMAT0027618 hsa-miR-6859-5p    MI0022705
    MIMAT0027619     chr1  [ 0, 22]      -   |     <NA>    miRNA      <NA>      <NA>   MIMAT0027619    MIMAT0027619 hsa-miR-6859-3p    MI0022705
    MIMAT0005890     chr1  [72, 92]      +   |     <NA>    miRNA      <NA>      <NA>   MIMAT0005890    MIMAT0005890    hsa-miR-1302    MI0006363
  MIMAT0027618_1     chr1  [40, 62]      -   |     <NA>    miRNA      <NA>      <NA> MIMAT0027618_1    MIMAT0027618 hsa-miR-6859-5p    MI0026420
  MIMAT0027619_1     chr1  [ 0, 22]      -   |     <NA>    miRNA      <NA>      <NA> MIMAT0027619_1    MIMAT0027619 hsa-miR-6859-3p    MI0026420
However, I next want to count reads overlapping their mature miRNA using the summarizeOverlaps function. The miRNA reads have been mapped to miRNA hairpins so that when I import the bam file, all of the seqnames are the miRNA hairpin ID (MI00....).
In order to overlap reads, I need to replace the 'chr' seqnames in the mature_relative Genomic Ranges object with the hairpin ID in the Derives_from column. However, when I try to do this I get the following error:
 seqnames(mature_relative) <- mcols(mature_relative)$Derives_from
Error in `seqnames<-`(`*tmp*`, value = c("MI0022705", "MI0022705", "MI0006363",  : 
  levels of supplied 'seqnames' must be identical to 'seqlevels(x)'
                    
                
                
Hi Michael,
When I try the second option, it doesn't add the hairpin ID as the seqnames. Could this be because there are multiple mature per hairpin?
gff <- import('hsa.gff3') > hairpin <- gff[mcols(gff)$type == 'miRNA_primary_transcript',] > names(hairpin) <- mcols(hairpin)$ID > mature <- gff[mcols(gff)$type == 'miRNA',] > names(mature) <- mcols(mature)$ID > > mature_relative1 <- mapToTranscripts(mature, hairpin) > mature_relative2 <- pmapToTranscripts(mature, hairpin[mature$Derives_from]) > mature_relative1 GRanges object with 2813 ranges and 2 metadata columns: seqnames ranges strand | xHits transcriptsHits <Rle> <IRanges> <Rle> | <integer> <integer> MIMAT0027618 MI0022705 [ 6, 28] - | 1 1 MIMAT0027619 MI0022705 [46, 68] - | 2 1 MIMAT0005890 MI0006363 [73, 93] + | 3 2 MIMAT0027618_1 MI0026420 [ 6, 28] - | 4 3 MIMAT0027619_1 MI0026420 [46, 68] - | 5 3 ... ... ... ... ... ... ... MIMAT0005829 MI0006277 [61, 83] - | 2809 1877 MIMAT0005829_1 MI0015971 [61, 83] - | 2810 1878 MIMAT0005829_2 MI0015972 [61, 83] + | 2811 1879 MIMAT0018119_1 MI0023561 [10, 32] + | 2812 1880 MIMAT0023714_1 MI0023563 [39, 62] + | 2813 1881 ------- seqinfo: 1881 sequences from an unspecified genome; no seqlengths > mature_relative2 GRanges object with 2813 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> MIMAT0027618 1 [ 6, 28] - MIMAT0027619 2 [46, 68] - MIMAT0005890 3 [73, 93] + MIMAT0027618_1 4 [ 6, 28] - MIMAT0027619_1 5 [46, 68] - ... ... ... ... MIMAT0005829 2809 [61, 83] - MIMAT0005829_1 2810 [61, 83] - MIMAT0005829_2 2811 [61, 83] + MIMAT0018119_1 2812 [10, 32] + MIMAT0023714_1 2813 [39, 62] + ------- seqinfo: 2813 sequences from an unspecified genome; no seqlengths >No it is because there is a bug. I fixed this in GenomicFeatures 1.22.6, will be out in a couple days. Temporary work around is: