Introns at transcript expression level
2
0
Entering edit mode
cookie ▴ 20
@cookie-7508
Last seen 9.7 years ago
USA

Hi,

I am trying to get the average number of introns per transcripts in the human genome. I am new to this, but I concluded that I have to use the package GenomicFeatures or Iranges.

Thus far I have made this code chuck:

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene)

introns=intronsByTranscript(TxDb.Hsapiens.UCSC.hg38.knownGene)
 

And that's it. I am confused about what to do next, but I am quite sure I missing something here. I tried to do something with the findOverlaps function, but it seems strange to overlap the introns with the txdb object again, as I extracted the 
the introns from transcripts. What is the intronsByTranscripts function really doing here?


What is the solution to this? Any help is appreciated.

Thanks!

genomicfeatures iranges • 1.9k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 6 hours ago
United States

The intronsByTranscript() function is creating a GRangesList item:

 

> intr <- intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene)
> intr
GRangesList object of length 82960:
$1
GRanges object with 2 ranges and 0 metadata columns:
      seqnames         ranges strand
         <Rle>      <IRanges>  <Rle>
  [1]     chr1 [12228, 12612]      +
  [2]     chr1 [12722, 13220]      +

$2
GRanges object with 2 ranges and 0 metadata columns:
      seqnames         ranges strand
  [1]     chr1 [12228, 12594]      +
  [2]     chr1 [12722, 13402]      +

$3
GRanges object with 2 ranges and 0 metadata columns:
      seqnames         ranges strand
  [1]     chr1 [12228, 12645]      +
  [2]     chr1 [12698, 13220]      +

...
<82957 more elements>
-------
seqinfo: 93 sequences (1 circular) from hg19 genome

To find out more about this, and what you can do with it, see ?GRangesList. But to answer your question, you want elementLengths():

> lens <- elementLengths(intr)
> summary(lens)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
   0.000    2.000    6.000    7.948   11.000 4910.000

 

 

ADD COMMENT
0
Entering edit mode

Thanks a lot!

ADD REPLY
1
Entering edit mode
@herve-pages-1542
Last seen 6 hours ago
Seattle, WA, United States

Why the average number of introns per transcript and not the average number of exons?

Note that in a perfect world the latter should be equal to the former plus 1. In practice it's not:

ex_by_tx <- exonsBy(txdb, use.names=TRUE)
int_by_tx <- intronsByTranscript(txdb, use.names=TRUE)

ex_by_tx and int_by_tx are parallel GRangesList objects:

length(ex_by_tx)
# [1] 82960
length(int_by_tx)
# [1] 82960
identical(names(ex_by_tx), names(int_by_tx))
# [1] TRUE

The number of introns in a transcript is at most the number of exons minus 1:

all(elementLengths(int_by_tx) <= elementLengths(ex_by_tx) - 1)
# [1] TRUE

However, it's not always equal to the number of exons minus 1:

table(elementLengths(int_by_tx) == elementLengths(ex_by_tx) - 1)
# FALSE  TRUE 
#    33 82927 

It's a rare situation though. After some investigation, this seems to happen for transcripts that UCSC reports as having adjacent exons. Like for example in transcript uc001sem.3:

ex_by_tx$uc001sem.3
# GRanges object with 3 ranges and 3 metadata columns:
#       seqnames               ranges strand |   exon_id   exon_name exon_rank
#          <Rle>            <IRanges>  <Rle> | <integer> <character> <integer>
#   [1]    chr12 [54366910, 54367707]      + |    162570        <NA>         1
#   [2]    chr12 [54368965, 54369667]      + |    162571        <NA>         2
#   [3]    chr12 [54369668, 54370203]      + |    162572        <NA>         3
#   -------
#   seqinfo: 93 sequences (1 circular) from hg19 genome

int_by_tx$uc001sem.3
# GRanges object with 1 range and 0 metadata columns:
#       seqnames               ranges strand
#          <Rle>            <IRanges>  <Rle>
#   [1]    chr12 [54367708, 54368964]      +
#   -------
#   seqinfo: 93 sequences (1 circular) from hg19 genome

Don't know what that means exactly from a biological point of view (one would need to ask to the UCSC people). Anyway, just thought I should mention it if you're going to count the number of introns per transcript. (Should probably add this to the man page for intronsByTranscript too.)

Cheers,

H.

 

ADD COMMENT
0
Entering edit mode

This is quite interesting. Is there a paper on this? I am interested how it is possible exon +1 is equal introns.

ADD REPLY

Login before adding your answer.

Traffic: 704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6