issue renaming seqnames in genomic ranges
1
0
Entering edit mode
Jake ▴ 90
@jake-7236
Last seen 20 months ago
United States

I'm trying to create a GenomicRanges list with mature miRNA coordinates relative to their hairpin precursor sequence. Essentially each hairpin will be its own chromosome with 1-2 mature miRNAs per "hairpin chromosome". I downloaded the human miRNA gff file from miRbase and did the following:

gff <- import('hsa.gff3')
hairpin <- gff[mcols(gff)$type == 'miRNA_primary_transcript',]
names(hairpin) <- mcols(hairpin)$ID
mature <- gff[mcols(gff)$type == 'miRNA',]
names(mature) <- mcols(mature)$Derives_from
mature_relative <- shift(mature, -start(hairpin[names(mature),]))
names(mature_relative) <- mcols(mature)$ID​

This is what I end up with:

GRanges object with 2813 ranges and 8 metadata columns:
                 seqnames    ranges strand   |   source     type     score     phase             ID           Alias            Name Derives_from
                    <Rle> <IRanges>  <Rle>   | <factor> <factor> <numeric> <integer>    <character> <CharacterList>     <character>  <character>
    MIMAT0027618     chr1  [40, 62]      -   |     <NA>    miRNA      <NA>      <NA>   MIMAT0027618    MIMAT0027618 hsa-miR-6859-5p    MI0022705
    MIMAT0027619     chr1  [ 0, 22]      -   |     <NA>    miRNA      <NA>      <NA>   MIMAT0027619    MIMAT0027619 hsa-miR-6859-3p    MI0022705
    MIMAT0005890     chr1  [72, 92]      +   |     <NA>    miRNA      <NA>      <NA>   MIMAT0005890    MIMAT0005890    hsa-miR-1302    MI0006363
  MIMAT0027618_1     chr1  [40, 62]      -   |     <NA>    miRNA      <NA>      <NA> MIMAT0027618_1    MIMAT0027618 hsa-miR-6859-5p    MI0026420
  MIMAT0027619_1     chr1  [ 0, 22]      -   |     <NA>    miRNA      <NA>      <NA> MIMAT0027619_1    MIMAT0027619 hsa-miR-6859-3p    MI0026420​

However, I next want to count reads overlapping their mature miRNA using the summarizeOverlaps function. The miRNA reads have been mapped to miRNA hairpins so that when I import the bam file, all of the seqnames are the miRNA hairpin ID (MI00....). 

In order to overlap reads, I need to replace the 'chr' seqnames in the mature_relative Genomic Ranges object with the hairpin ID in the Derives_from column. However, when I try to do this I get the following error:

 seqnames(mature_relative) <- mcols(mature_relative)$Derives_from
Error in `seqnames<-`(`*tmp*`, value = c("MI0022705", "MI0022705", "MI0006363",  : 
  levels of supplied 'seqnames' must be identical to 'seqlevels(x)'​
genomicranges • 1.9k views
ADD COMMENT
1
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States

The direct answer is that you would need to construct a new GRanges using the miRNA seqnames. But better would be to use the utility GenomicFeatures::mapToTranscripts(), which will map your mature miRNAs onto the space of the hairpin transcripts, setting the seqnames automatically (and no need to do any shifting):

mature_relative <- mapToTranscripts(mature, hairpin)

This mapping is based purely on overlaps, so it has the potential to return mappings to multiple hairpins, even if the annotation does not support the relationship. To guard against that, use this:

mature_relative <- pmapToTranscripts(mature, hairpin[mature$Derives_from])
ADD COMMENT
0
Entering edit mode

Hi Michael,

When I try the second option, it doesn't add the hairpin ID as the seqnames. Could this be because there are multiple mature per hairpin?

gff <- import('hsa.gff3')
> hairpin <- gff[mcols(gff)$type == 'miRNA_primary_transcript',]
> names(hairpin) <- mcols(hairpin)$ID
> mature <- gff[mcols(gff)$type == 'miRNA',]
> names(mature) <- mcols(mature)$ID
> 
> mature_relative1 <- mapToTranscripts(mature, hairpin)
> mature_relative2 <- pmapToTranscripts(mature, hairpin[mature$Derives_from])
> mature_relative1
GRanges object with 2813 ranges and 2 metadata columns:
                  seqnames    ranges strand   |     xHits transcriptsHits
                     <Rle> <IRanges>  <Rle>   | <integer>       <integer>
    MIMAT0027618 MI0022705  [ 6, 28]      -   |         1               1
    MIMAT0027619 MI0022705  [46, 68]      -   |         2               1
    MIMAT0005890 MI0006363  [73, 93]      +   |         3               2
  MIMAT0027618_1 MI0026420  [ 6, 28]      -   |         4               3
  MIMAT0027619_1 MI0026420  [46, 68]      -   |         5               3
             ...       ...       ...    ... ...       ...             ...
    MIMAT0005829 MI0006277  [61, 83]      -   |      2809            1877
  MIMAT0005829_1 MI0015971  [61, 83]      -   |      2810            1878
  MIMAT0005829_2 MI0015972  [61, 83]      +   |      2811            1879
  MIMAT0018119_1 MI0023561  [10, 32]      +   |      2812            1880
  MIMAT0023714_1 MI0023563  [39, 62]      +   |      2813            1881
  -------
  seqinfo: 1881 sequences from an unspecified genome; no seqlengths
> mature_relative2
GRanges object with 2813 ranges and 0 metadata columns:
                 seqnames    ranges strand
                    <Rle> <IRanges>  <Rle>
    MIMAT0027618        1  [ 6, 28]      -
    MIMAT0027619        2  [46, 68]      -
    MIMAT0005890        3  [73, 93]      +
  MIMAT0027618_1        4  [ 6, 28]      -
  MIMAT0027619_1        5  [46, 68]      -
             ...      ...       ...    ...
    MIMAT0005829     2809  [61, 83]      -
  MIMAT0005829_1     2810  [61, 83]      -
  MIMAT0005829_2     2811  [61, 83]      +
  MIMAT0018119_1     2812  [10, 32]      +
  MIMAT0023714_1     2813  [39, 62]      +
  -------
  seqinfo: 2813 sequences from an unspecified genome; no seqlengths
> 
ADD REPLY
0
Entering edit mode

No it is because there is a bug. I fixed this in GenomicFeatures 1.22.6, will be out in a couple days. Temporary work around is:

mature_relative <- pmapToTranscripts(mature, as(hairpin[mature$Derives_from], "List"))
ADD REPLY

Login before adding your answer.

Traffic: 580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6