Question: ensembldb salmon deseq2 txt2import right tool?
gravatar for aaron.dickey
16 months ago by
aaron.dickey0 wrote:

I have 20 paired end libraries pseudomapped to the rat transcriptome with Salmon and I wish to create a tx2gene dataframe via the ensembldb package for downstream analysis in deseq2 as recommended here:

Note: if you are using an Ensembl transcriptome, the easiest way to create the tx2gene data.frame is to use the ensembldb packages. The annotation packages can be found by version number, and use the pattern EnsDb.Hsapiens.vXX. The transcripts function can be used with return.type="DataFrame", in order to obtain something like the df object constructed in the code chunk above. See the ensembldb package vignette for more details. 

however, while biocLite("EnsDb.Hsapiens.v75") works fine, biocLite("EnsDb.Rnorvegicus.v89") returns: Warning message: package 'EnsDb.Rnorvegicus.v89' is not available (for R version 3.4.0)

Is this a case of trying to use the wrong tool, i.e these recommendations apply to human data but not other species... or some other issue? Would BioMart help?

Thanks! Aaron 

ADD COMMENTlink modified 16 months ago • written 16 months ago by aaron.dickey0
gravatar for Johannes Rainer
16 months ago by
Johannes Rainer1.3k
Johannes Rainer1.3k wrote:

Hi Aaron,

the warning message just means that the EnsDb.Rnorvegicus.v89 package is not available - as packages for the rat genome there are only EnsDb.Rnorvegicus.v75 and EnsDb.Rnorvegicus.v79 available that you could install using the biocLite function.

If you need more recent gene models, you can get the Ensembl rat data for Ensembl version 87 and 88 from AnnotationHub (I am currently building the ones for Ensembl 89, but that takes some time):

ah <- AnnotationHub()

## To get the EnsDb for Rnorvegicus, Ensembl version 88:
edb <- query(ah, "EnsDb.Rnorvegicus.v88")[[1]]

## You can then use this edb for your queries
GRanges object with 41078 ranges and 6 metadata columns:
                     seqnames             ranges strand |              tx_id
                        <Rle>          <IRanges>  <Rle> |        <character>
  ENSRNOT00000044187        1   [396700, 409676]      + | ENSRNOT00000044187
  ENSRNOT00000072186        1   [396700, 409676]      + | ENSRNOT00000072186
  ENSRNOT00000093216        1   [396840, 409750]      + | ENSRNOT00000093216
                 ...      ...                ...    ... .                ...
  ENSRNOT00000085333        Y [2653008, 2654859]      + | ENSRNOT00000085333
  ENSRNOT00000092839        Y [3181118, 3181328]      + | ENSRNOT00000092839
  ENSRNOT00000086356        Y [3253610, 3254888]      + | ENSRNOT00000086356
                               tx_biotype tx_cds_seq_start tx_cds_seq_end
                              <character>        <integer>      <integer>
  ENSRNOT00000044187 processed_transcript             <NA>           <NA>
  ENSRNOT00000072186 processed_transcript             <NA>           <NA>
  ENSRNOT00000093216 processed_transcript             <NA>           <NA>
                 ...                  ...              ...            ...
  ENSRNOT00000085333              lincRNA             <NA>           <NA>
  ENSRNOT00000092839 processed_pseudogene             <NA>           <NA>
  ENSRNOT00000086356              lincRNA             <NA>           <NA>
                                gene_id            tx_name
                            <character>        <character>
  ENSRNOT00000044187 ENSRNOG00000046319 ENSRNOT00000044187
  ENSRNOT00000072186 ENSRNOG00000046319 ENSRNOT00000072186
  ENSRNOT00000093216 ENSRNOG00000046319 ENSRNOT00000093216
                 ...                ...                ...
  ENSRNOT00000085333 ENSRNOG00000052946 ENSRNOT00000085333
  ENSRNOT00000092839 ENSRNOG00000062169 ENSRNOT00000092839
  ENSRNOT00000086356 ENSRNOG00000058415 ENSRNOT00000086356
  seqinfo: 162 sequences from Rnor_6.0 genome


ADD COMMENTlink written 16 months ago by Johannes Rainer1.3k

Hi, I have a question, the output of kallisto's transctrip name is 

ENSMUST00000178537.1 but there is no .1 in ensembldb's output tx_name . when I do the tximport, error:
Error in summarizeToGene(txi.kallisto, tx2gene) : 
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.


Do you know how to solve this problem? Thanks a lot for your time.

ADD REPLYlink written 4 weeks ago by renhua.song19890

You have to remove the transcript version number from the transcript IDs (i.e. the .1). Just be sure that the Ensembl version of the EnsDb you are using and the version that was used for kallisto match.

A fast way to remove them is e.g. top_table$tx_id <- sub("\\.[0-9]*$", "", top_table$tx_id)

ADD REPLYlink written 4 weeks ago by Johannes Rainer1.3k

FYI: Johannes' solution is automagically performed within the tximport function when specifying the argument ignoreTxVersion = TRUE (default = FALSE).

ADD REPLYlink written 4 weeks ago by Guido Hooiveld2.3k

Check the help page for ?tximport. You can ignore version numbers 

ADD REPLYlink written 4 weeks ago by Michael Love19k
gravatar for aaron.dickey
16 months ago by
aaron.dickey0 wrote:

Thanks Johannes,

That makes sense. It seems since the pseudomapping was done to the v89 transcriptome that the same gene model should be used. I guess more of a theoretical than a practical question of whether an earlier gene model would be appropriate. Thanks again!

ADD COMMENTlink written 16 months ago by aaron.dickey0

I am currently generating the EnsDbs for Ensembl 89 - it might take some days but then they should be available in Bioc devel's AnnotationHub.

ADD REPLYlink written 16 months ago by Johannes Rainer1.3k

I have just started a zebrafish project, will you be creating and EnsDbs for Ensembl v89 of Danio rerio as well, if not may I ask the best way to? Thanks

ADD REPLYlink written 16 months ago by Nicholas Owen0

I create EnsDbs for all species defined in Ensembl, this includes also Danio rerio. Once I'm done, these EnsDbs will show up in the AnnotationHub of the Bioc devel version.

Note that the Danio rerio EnsDbs for Ensembl 87 and 88 are already in AnnotationHub

ADD REPLYlink written 16 months ago by Johannes Rainer1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 139 users visited in the last hour