I'm trying to create a GenomicRanges list with mature miRNA coordinates relative to their hairpin precursor sequence. Essentially each hairpin will be its own chromosome with 1-2 mature miRNAs per "hairpin chromosome". I downloaded the human miRNA gff file from miRbase and did the following:
gff <- import('hsa.gff3')
hairpin <- gff[mcols(gff)$type == 'miRNA_primary_transcript',]
names(hairpin) <- mcols(hairpin)$ID
mature <- gff[mcols(gff)$type == 'miRNA',]
names(mature) <- mcols(mature)$Derives_from
mature_relative <- shift(mature, -start(hairpin[names(mature),]))
names(mature_relative) <- mcols(mature)$ID
This is what I end up with:
GRanges object with 2813 ranges and 8 metadata columns:
seqnames ranges strand | source type score phase ID Alias Name Derives_from
<Rle> <IRanges> <Rle> | <factor> <factor> <numeric> <integer> <character> <CharacterList> <character> <character>
MIMAT0027618 chr1 [40, 62] - | <NA> miRNA <NA> <NA> MIMAT0027618 MIMAT0027618 hsa-miR-6859-5p MI0022705
MIMAT0027619 chr1 [ 0, 22] - | <NA> miRNA <NA> <NA> MIMAT0027619 MIMAT0027619 hsa-miR-6859-3p MI0022705
MIMAT0005890 chr1 [72, 92] + | <NA> miRNA <NA> <NA> MIMAT0005890 MIMAT0005890 hsa-miR-1302 MI0006363
MIMAT0027618_1 chr1 [40, 62] - | <NA> miRNA <NA> <NA> MIMAT0027618_1 MIMAT0027618 hsa-miR-6859-5p MI0026420
MIMAT0027619_1 chr1 [ 0, 22] - | <NA> miRNA <NA> <NA> MIMAT0027619_1 MIMAT0027619 hsa-miR-6859-3p MI0026420
However, I next want to count reads overlapping their mature miRNA using the summarizeOverlaps function. The miRNA reads have been mapped to miRNA hairpins so that when I import the bam file, all of the seqnames are the miRNA hairpin ID (MI00....).
In order to overlap reads, I need to replace the 'chr' seqnames in the mature_relative Genomic Ranges object with the hairpin ID in the Derives_from column. However, when I try to do this I get the following error:
seqnames(mature_relative) <- mcols(mature_relative)$Derives_from
Error in `seqnames<-`(`*tmp*`, value = c("MI0022705", "MI0022705", "MI0006363", :
levels of supplied 'seqnames' must be identical to 'seqlevels(x)'

Hi Michael,
When I try the second option, it doesn't add the hairpin ID as the seqnames. Could this be because there are multiple mature per hairpin?
gff <- import('hsa.gff3') > hairpin <- gff[mcols(gff)$type == 'miRNA_primary_transcript',] > names(hairpin) <- mcols(hairpin)$ID > mature <- gff[mcols(gff)$type == 'miRNA',] > names(mature) <- mcols(mature)$ID > > mature_relative1 <- mapToTranscripts(mature, hairpin) > mature_relative2 <- pmapToTranscripts(mature, hairpin[mature$Derives_from]) > mature_relative1 GRanges object with 2813 ranges and 2 metadata columns: seqnames ranges strand | xHits transcriptsHits <Rle> <IRanges> <Rle> | <integer> <integer> MIMAT0027618 MI0022705 [ 6, 28] - | 1 1 MIMAT0027619 MI0022705 [46, 68] - | 2 1 MIMAT0005890 MI0006363 [73, 93] + | 3 2 MIMAT0027618_1 MI0026420 [ 6, 28] - | 4 3 MIMAT0027619_1 MI0026420 [46, 68] - | 5 3 ... ... ... ... ... ... ... MIMAT0005829 MI0006277 [61, 83] - | 2809 1877 MIMAT0005829_1 MI0015971 [61, 83] - | 2810 1878 MIMAT0005829_2 MI0015972 [61, 83] + | 2811 1879 MIMAT0018119_1 MI0023561 [10, 32] + | 2812 1880 MIMAT0023714_1 MI0023563 [39, 62] + | 2813 1881 ------- seqinfo: 1881 sequences from an unspecified genome; no seqlengths > mature_relative2 GRanges object with 2813 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> MIMAT0027618 1 [ 6, 28] - MIMAT0027619 2 [46, 68] - MIMAT0005890 3 [73, 93] + MIMAT0027618_1 4 [ 6, 28] - MIMAT0027619_1 5 [46, 68] - ... ... ... ... MIMAT0005829 2809 [61, 83] - MIMAT0005829_1 2810 [61, 83] - MIMAT0005829_2 2811 [61, 83] + MIMAT0018119_1 2812 [10, 32] + MIMAT0023714_1 2813 [39, 62] + ------- seqinfo: 2813 sequences from an unspecified genome; no seqlengths >No it is because there is a bug. I fixed this in GenomicFeatures 1.22.6, will be out in a couple days. Temporary work around is: