Search
Question: how to get intron with ensembldb. package?
0
gravatar for alessandro.pastore
6 months ago by
alessandro.pastore20 wrote:

I would like to generate a  GRangesList of all gene introns with names. I can make the exon list but I do not see a elegant way do get the introns. any suggestion?

 

Thanks!

 

library(AnnotationHub)​

edb <- query(AnnotationHub(), c("Ensembl 90 EnsDb", "Homo sapiens"))[[1]]

exons.Grange <- exons(edb, columns = c(listColumns(edb , "tx"), "gene_name"))

exons.Grange <- exons.Grange[duplicated(exons.Grange$exon_id),]

exons.Grange <- split(exons.Grange, exons.Grange$exon_id)
> exons.Grange
GRangesList object of length 221795:
$ENSE00000327880 
GRanges object with 5 ranges and 11 metadata columns:
                  seqnames               ranges strand |           tx_id     tx_biotype tx_seq_start tx_seq_end tx_cds_seq_start tx_cds_seq_end
                     <Rle>            <IRanges>  <Rle> |     <character>    <character>    <integer>  <integer>        <integer>      <integer>
  ENSE00000327880        1 [27732603, 27732657]      + | ENST00000419687 protein_coding     27725996   27761473         27726081       27760581
  ENSE00000327880        1 [27732603, 27732657]      + | ENST00000530324 protein_coding     27726028   27759764         27726081       27759657
  ENSE00000327880        1 [27732603, 27732657]      + | ENST00000234549 protein_coding     27726028   27760581         27726081       27760581
  ENSE00000327880        1 [27732603, 27732657]      + | ENST00000373949 protein_coding     27726028   27761964         27726081       27760581
  ENSE00000327880        1 [27732603, 27732657]      + | ENST00000010299 protein_coding     27726057   27760581         27726081       27760581
                          gene_id tx_support_level         tx_name   gene_name         exon_id
                      <character>        <integer>     <character> <character>     <character>
  ENSE00000327880 ENSG00000009780                2 ENST00000419687      FAM76A ENSE00000327880
  ENSE00000327880 ENSG00000009780                1 ENST00000530324      FAM76A ENSE00000327880
  ENSE00000327880 ENSG00000009780                1 ENST00000234549      FAM76A ENSE00000327880
  ENSE00000327880 ENSG00000009780                2 ENST00000373949      FAM76A ENSE00000327880
  ENSE00000327880 ENSG00000009780                1 ENST00000010299      FAM76A ENSE00000327880

$ENSE00000328922 
GRanges object with 2 ranges and 11 metadata columns:
                  seqnames                 ranges strand |           tx_id              tx_biotype tx_seq_start tx_seq_end tx_cds_seq_start
  ENSE00000328922        3 [131018506, 131018716]      - | ENST00000264992          protein_coding    131013875  131026802        131014057
  ENSE00000328922        3 [131018506, 131018716]      - | ENST00000507978 nonsense_mediated_decay    131013982  131026854        131017000
                  tx_cds_seq_end         gene_id tx_support_level         tx_name gene_name         exon_id
  ENSE00000328922      131025306 ENSG00000034533                1 ENST00000264992     ASTE1 ENSE00000328922
  ENSE00000328922      131025306 ENSG00000034533                2 ENST00000507978     ASTE1 ENSE00000328922

$ENSE00000329326 
GRanges object with 2 ranges and 11 metadata columns:
                  seqnames                 ranges strand |           tx_id     tx_biotype tx_seq_start tx_seq_end tx_cds_seq_start tx_cds_seq_end
  ENSE00000329326        8 [132583694, 132583779]      - | ENST00000250173 protein_coding    132572201  132675559        132578498      132675493
  ENSE00000329326        8 [132583694, 132583779]      - | ENST00000618342 protein_coding    132571953  132661667        132572306      132661667
                          gene_id tx_support_level         tx_name gene_name         exon_id
  ENSE00000329326 ENSG00000129295                1 ENST00000250173     LRRC6 ENSE00000329326
  ENSE00000329326 ENSG00000129295                5 ENST00000618342     LRRC6 ENSE00000329326

...
<221792 more elements>
-------
seqinfo: 388 sequences from GRCh38 genome

 

 

 

ADD COMMENTlink modified 6 months ago • written 6 months ago by alessandro.pastore20
1
gravatar for alessandro.pastore
6 months ago by
alessandro.pastore20 wrote:
I can generate a GRangesList of introns but the name are lost . 

intron.Grange <- transcripts(edb, columns = c(listColumns(edb , "tx"), "gene_name"), 
                             filter = list(GeneBiotypeFilter("protein_coding") ))

intron.Grange <- setdiff(intron.Grange, exons.Grange)

intron.Grange$intron_id <- paste("intron_id", seq(1:length(intron.Grange)), sep = "")

intron.Grange <- split(intron.Grange, intron.Grange$intron_id)
ADD COMMENTlink written 6 months ago by alessandro.pastore20

I'd say your approach seems to be pretty OK. There is no intron ID stored in the database, so you can't get that from an EnsDb.

ADD REPLYlink written 6 months ago by Johannes Rainer1.3k

 

Thanks ! I thought it would be nice to keep some kind of mcols information...

 

ADD REPLYlink written 6 months ago by alessandro.pastore20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 317 users visited in the last hour