Question: Create Ensembl transcript to gene IDs table with R ensembldb package for tximport
0
gravatar for user31888
17 months ago by
user3188830
United States
user3188830 wrote:

Following this tutorial, I generated transcript counts with Salmon, but now I'm stuck at the step of building a data frame transcripts to gene IDs (tx2gene) using ensembldb package (for mouse).

• Question 1:

Using the ensembl package, how one could list the different database versions that exist for a specific organism (let's say mouse)?

• Question 2:

The tutorial mentions "The transcripts function can be used with return.type="DataFrame". How to use this function?

salmon ensembldb tximport • 513 views
ADD COMMENTlink modified 17 months ago by Johannes Rainer1.5k • written 17 months ago by user3188830

Take a look at the ensembldb vignette. It’s very detailed 

ADD REPLYlink written 17 months ago by Michael Love25k
Answer: Create Ensembl transcript to gene IDs table with R ensembldb package for tximpor
0
gravatar for Johannes Rainer
17 months ago by
Johannes Rainer1.5k
Italy
Johannes Rainer1.5k wrote:

Re question 1:

To get an overview of all available EnsDb databases for mouse you use AnnotationHub, not ensembldb:

> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2018-04-30
> query(ah, "EnsDb.Mmusculus")
AnnotationHub with 6 records
# snapshotDate(): 2018-04-30
# $dataprovider: Ensembl
# $species: Mus Musculus
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53222"]]'

            title                            
  AH53222 | Ensembl 87 EnsDb for Mus Musculus
  AH53726 | Ensembl 88 EnsDb for Mus Musculus
  AH56691 | Ensembl 89 EnsDb for Mus Musculus
  AH57770 | Ensembl 90 EnsDb for Mus Musculus
  AH60788 | Ensembl 91 EnsDb for Mus Musculus
  AH60992 | Ensembl 92 EnsDb for Mus Musculus
>


Re question 2:

we load first one of the EnsDb databases, say for Ensembl 92:

> edb <- ah[["AH60992"]]
require(“ensembldb”)
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
    '/Users/jo//.AnnotationHub/67738'
> ## Now get the transcripts table as a data frame
> txs <- transcripts(edb, return.type = "DataFrame")
> txs
DataFrame with 137146 rows and 9 columns
                    tx_id     tx_biotype tx_seq_start tx_seq_end
              <character>    <character>    <integer>  <integer>
1      ENSMUST00000082387        Mt_tRNA            1         68
2      ENSMUST00000179436 protein_coding           66       1479
3      ENSMUST00000082388        Mt_rRNA           70       1024
4      ENSMUST00000177695 protein_coding          394       1059
5      ENSMUST00000082389        Mt_tRNA         1025       1093
...                   ...            ...          ...        ...
137142 ENSMUST00000192752            TEC    195220051  195222766
137143 ENSMUST00000178897          miRNA    195228278  195228398
137144 ENSMUST00000184449       misc_RNA    195240910  195241007
137145 ENSMUST00000192733            TEC    195259299  195259848
137146 ENSMUST00000194529            TEC    195321283  195323493
       tx_cds_seq_start tx_cds_seq_end            gene_id tx_support_level
              <integer>      <integer>        <character>        <integer>
1                    NA             NA ENSMUSG00000064336               NA
2                    66           1479 ENSMUSG00000095742                5
3                    NA             NA ENSMUSG00000064337               NA
4                   394           1059 ENSMUSG00000094121                1
5                    NA             NA ENSMUSG00000064338               NA
...                 ...            ...                ...              ...
137142               NA             NA ENSMUSG00000102236               NA
137143               NA             NA ENSMUSG00000093823               NA
137144               NA             NA ENSMUSG00000099208               NA
137145               NA             NA ENSMUSG00000104297               NA
137146               NA             NA ENSMUSG00000102307               NA
                  tx_name
              <character>
1      ENSMUST00000082387
2      ENSMUST00000179436
3      ENSMUST00000082388
4      ENSMUST00000177695
5      ENSMUST00000082389
...                   ...
137142 ENSMUST00000192752
137143 ENSMUST00000178897
137144 ENSMUST00000184449
137145 ENSMUST00000192733
137146 ENSMUST00000194529

Hope this helps.

cheers, jo

ADD COMMENTlink written 17 months ago by Johannes Rainer1.5k

Awesome, thanks Jo !

ADD REPLYlink written 16 months ago by user3188830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 250 users visited in the last hour