Mapping transcript to genomic positions via GenomicFeatures is not splice aware
0
0
Entering edit mode
nattzy94 ▴ 20
@nattzy94-23466
Last seen 2.0 years ago
Singapore

I have a bunch of transcript relative coordinates that I need to map to genomic coordinates. I am using a custom annotation file that I load using:

txdb <- makeTxDbFromGFF(file='../2022_10_04_orthoalltime_novel_dge/ortho_alltime_merged_temp.gtf',
                        dataSource="orthoalltime",
                        organism="Homo sapiens",
                        format='gtf')

I get all transcripts and exons in my annotation file:

tx_by_gene <- transcriptsBy(txdb, by="gene")
ex_by_tx <- exonsBy(txdb, "tx", use.names=TRUE)

I then get all exons belonging to my transcript of interest:

my_txid = 'MSTRG.26927.8'
my_transcripts <- tx_by_gene[[novel.small$my_gene_id[i]]]
my_tx_names <- mcols(my_transcripts)$tx_name
my_ex_by_tx <- ex_by_tx[intersect(my_tx_names, my_txid]

This is the output of my_ex_by_tx which is also the exon-level structure of 'MSTRG.26927.8'

GRangesList object of length 1:
$`19`
GRanges object with 18 ranges and 3 metadata columns:
       seqnames          ranges strand |   exon_id   exon_name exon_rank
          <Rle>       <IRanges>  <Rle> | <integer> <character> <integer>
   [1]        7 6109801-6109843      + |    228914        <NA>         1
   [2]        7 6111125-6111374      + |    228915        <NA>         2
   [3]        7 6115323-6115523      + |    228917        <NA>         3
   [4]        7 6135841-6135951      + |    228921        <NA>         4
   [5]        7 6139092-6139194      + |    228922        <NA>         5
   ...      ...             ...    ... .       ...         ...       ...
  [14]        7 6150412-6150506      + |    228932        <NA>        14
  [15]        7 6153756-6155195      + |    228933        <NA>        15
  [16]        7 6156754-6157055      + |    228934        <NA>        16
  [17]        7 6159450-6159493      + |    228936        <NA>        17
  [18]        7 6160555-6161564      + |    228937        <NA>        18
  -------
  seqinfo: 52 sequences (1 circular) from an unspecified genome; no seqlengths

If I try to map transcript/exon coordinates to genome coordinates:

coord <- GRanges(chromosome, IRanges(start = 1, end=43, names=my_txid), strand='+')
my_grange = mapFromTranscripts(coord, my_ex_by_tx)
my_grange

This successfully returns the first exon:

GRanges object with 1 range and 2 metadata columns:
                seqnames          ranges strand |     xHits transcriptsHits
                   <Rle>       <IRanges>  <Rle> | <integer>       <integer>
  MSTRG.26927.8        7 6109801-6109843      + |         1               1
  -------
  seqinfo: 52 sequences from an unspecified genome; no seqlengths

However, if I try to map position 44 onwards, the result does not include the second genomic region. Instead, it outputs the start of exon1 until the start of exon2 inclusive o the intron region:

coord <- GRanges(chromosome, IRanges(start = 1, end=44, names=my_txid), strand='+')
my_grange = mapFromTranscripts(coord, my_ex_by_tx)
my_grange
GRanges object with 1 range and 2 metadata columns:
                seqnames          ranges strand |     xHits transcriptsHits
                   <Rle>       <IRanges>  <Rle> | <integer>       <integer>
  MSTRG.26927.8        7 6109801-6111125      + |         1               1
  -------
  seqinfo: 52 sequences from an unspecified genome; no seqlengths

How do I get the genomic coordinates such that they are splice aware? The transcriptToGenome function from ensembldb returns the desired output but I cannot use it as my annotation database is not an ensdb object.

GenomicFeatures • 608 views
ADD COMMENT

Login before adding your answer.

Traffic: 648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6