Retrieve canonical transcript for each gene via ensembldb R package
1
0
Entering edit mode
@simonboutry-19561
Last seen 5.2 years ago

I'm trying to retrieve the canonical transcript for each genes using ensembldb package. In the ensembl database there is a fiel canonical transcript in the gene table, which redirect you to the disered row in the transcript table. Therefore it is easy to obtain canonical transcript for each gene. But using ensembldb, I can't find that fiel "canonical transcript or any other way to retrieve the desired information (canonical transcript). Does someone know if there is such an "easy way" to obtain the answer ?

For the moment, based on the ensembldb I made a script implementing the procedure to compute the canonical transcript (as described on the ensembl website), it works, but is not as efficient as simply joining two tables.

Thanks in advance

annotation ensembldb • 2.6k views
ADD COMMENT
3
Entering edit mode
Johannes Rainer ★ 2.0k
@johannes-rainer-6987
Last seen 7 weeks ago
Italy

At present I am not including this information to the ensembldb databases. So there is no way to get the canonical transcript for a gene using ensembldb. Sorry for that.

ADD COMMENT
0
Entering edit mode

Has this changed?

ADD REPLY
1
Entering edit mode

Yes, indeed, it did. Depending on the Ensembl release from which an EnsDb was build there will be information on canonical transcripts present.

From Ensembl 105 on it is also possible to filter the EnsDb to only canonical transcripts:

library(AnnotationHub)
ah <- AnnotationHub()
## Get EnsDb for homo sapiens, Ensembl release 105
edb <- ah[["AH98047"]]

## Get all transcripts
length(transcripts(edb))
[1] 268255

## Get only canonical transcripts
length(transcripts(edb, filter = ~ tx_is_canonical == TRUE))
[1] 69329

For many releases an additional column (field) "canonical_transcript" is available for gene annotations:

> genes(edb)
GRanges object with 69329 ranges and 9 metadata columns:
                  seqnames            ranges strand |         gene_id
                     <Rle>         <IRanges>  <Rle> |     <character>
  ENSG00000223972        1       11869-14409      + | ENSG00000223972
  ENSG00000227232        1       14404-29570      - | ENSG00000227232
              ...      ...               ...    ... .             ...
  ENSG00000231514        Y 26626520-26627159      - | ENSG00000231514
  ENSG00000235857        Y 56855244-56855488      + | ENSG00000235857
                    gene_name           gene_biotype seq_coord_system
                  <character>            <character>      <character>
  ENSG00000223972     DDX11L1 transcribed_unproces..       chromosome
  ENSG00000227232      WASH7P unprocessed_pseudogene       chromosome
              ...         ...                    ...              ...
  ENSG00000231514      CCNQP2   processed_pseudogene       chromosome
  ENSG00000235857     CTBP2P1   processed_pseudogene       chromosome
                             description   gene_id_version canonical_transcript
                             <character>       <character>          <character>
  ENSG00000223972 DEAD/H-box helicase .. ENSG00000223972.5      ENST00000450305
  ENSG00000227232 WASP family homolog .. ENSG00000227232.5      ENST00000488147
              ...                    ...               ...                  ...
  ENSG00000231514 CCNQ pseudogene 2 [S.. ENSG00000231514.1      ENST00000435741
  ENSG00000235857 CTBP2 pseudogene 1 [.. ENSG00000235857.1      ENST00000431853
                       symbol                          entrezid
                  <character>                            <list>
  ENSG00000223972     DDX11L1 102725121,100287596,100287102,...
  ENSG00000227232      WASH7P                              <NA>
              ...         ...                               ...
  ENSG00000231514      CCNQP2                              <NA>
  ENSG00000235857     CTBP2P1                              <NA>
  -------
  seqinfo: 456 sequences (1 circular) from GRCh38 genome
ADD REPLY
0
Entering edit mode

Cool thanks!

ADD REPLY

Login before adding your answer.

Traffic: 609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6