Create Ensembl transcript to gene IDs table with R ensembldb package for tximport
1
0
Entering edit mode
user31888 ▴ 30
@user31888-9209
Last seen 5.3 years ago
United States

Following this tutorial, I generated transcript counts with Salmon, but now I'm stuck at the step of building a data frame transcripts to gene IDs (tx2gene) using ensembldb package (for mouse).

• Question 1:

Using the ensembl package, how one could list the different database versions that exist for a specific organism (let's say mouse)?

• Question 2:

The tutorial mentions "The transcripts function can be used with return.type="DataFrame". How to use this function?

ensembldb tximport salmon • 4.3k views
ADD COMMENT
0
Entering edit mode

Take a look at the ensembldb vignette. It’s very detailed 

ADD REPLY
0
Entering edit mode
Johannes Rainer ★ 2.0k
@johannes-rainer-6987
Last seen 21 days ago
Italy

Re question 1:

To get an overview of all available EnsDb databases for mouse you use AnnotationHub, not ensembldb:

> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2018-04-30
> query(ah, "EnsDb.Mmusculus")
AnnotationHub with 6 records
# snapshotDate(): 2018-04-30
# $dataprovider: Ensembl
# $species: Mus Musculus
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53222"]]'

            title                            
  AH53222 | Ensembl 87 EnsDb for Mus Musculus
  AH53726 | Ensembl 88 EnsDb for Mus Musculus
  AH56691 | Ensembl 89 EnsDb for Mus Musculus
  AH57770 | Ensembl 90 EnsDb for Mus Musculus
  AH60788 | Ensembl 91 EnsDb for Mus Musculus
  AH60992 | Ensembl 92 EnsDb for Mus Musculus
>


Re question 2:

we load first one of the EnsDb databases, say for Ensembl 92:

> edb <- ah[["AH60992"]]
require(“ensembldb”)
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
    '/Users/jo//.AnnotationHub/67738'
> ## Now get the transcripts table as a data frame
> txs <- transcripts(edb, return.type = "DataFrame")
> txs
DataFrame with 137146 rows and 9 columns
                    tx_id     tx_biotype tx_seq_start tx_seq_end
              <character>    <character>    <integer>  <integer>
1      ENSMUST00000082387        Mt_tRNA            1         68
2      ENSMUST00000179436 protein_coding           66       1479
3      ENSMUST00000082388        Mt_rRNA           70       1024
4      ENSMUST00000177695 protein_coding          394       1059
5      ENSMUST00000082389        Mt_tRNA         1025       1093
...                   ...            ...          ...        ...
137142 ENSMUST00000192752            TEC    195220051  195222766
137143 ENSMUST00000178897          miRNA    195228278  195228398
137144 ENSMUST00000184449       misc_RNA    195240910  195241007
137145 ENSMUST00000192733            TEC    195259299  195259848
137146 ENSMUST00000194529            TEC    195321283  195323493
       tx_cds_seq_start tx_cds_seq_end            gene_id tx_support_level
              <integer>      <integer>        <character>        <integer>
1                    NA             NA ENSMUSG00000064336               NA
2                    66           1479 ENSMUSG00000095742                5
3                    NA             NA ENSMUSG00000064337               NA
4                   394           1059 ENSMUSG00000094121                1
5                    NA             NA ENSMUSG00000064338               NA
...                 ...            ...                ...              ...
137142               NA             NA ENSMUSG00000102236               NA
137143               NA             NA ENSMUSG00000093823               NA
137144               NA             NA ENSMUSG00000099208               NA
137145               NA             NA ENSMUSG00000104297               NA
137146               NA             NA ENSMUSG00000102307               NA
                  tx_name
              <character>
1      ENSMUST00000082387
2      ENSMUST00000179436
3      ENSMUST00000082388
4      ENSMUST00000177695
5      ENSMUST00000082389
...                   ...
137142 ENSMUST00000192752
137143 ENSMUST00000178897
137144 ENSMUST00000184449
137145 ENSMUST00000192733
137146 ENSMUST00000194529

Hope this helps.

cheers, jo

ADD COMMENT
0
Entering edit mode

Awesome, thanks Jo !

ADD REPLY

Login before adding your answer.

Traffic: 859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6