Question

ensembldb salmon deseq2 txt2import right tool?

0

Entering edit mode

aaron.dickey • 0

@aarondickey-13210

Last seen 6.9 years ago

I have 20 paired end libraries pseudomapped to the rat transcriptome with Salmon and I wish to create a tx2gene dataframe via the ensembldb package for downstream analysis in deseq2 as recommended here:

Note: if you are using an Ensembl transcriptome, the easiest way to create the tx2gene data.frame is to use the ensembldb packages. The annotation packages can be found by version number, and use the pattern EnsDb.Hsapiens.vXX. The transcripts function can be used with return.type="DataFrame", in order to obtain something like the df object constructed in the code chunk above. See the ensembldb package vignette for more details.

however, while biocLite("EnsDb.Hsapiens.v75") works fine, biocLite("EnsDb.Rnorvegicus.v89") returns: Warning message: package 'EnsDb.Rnorvegicus.v89' is not available (for R version 3.4.0)

Is this a case of trying to use the wrong tool, i.e these recommendations apply to human data but not other species... or some other issue? Would BioMart help?

Thanks! Aaron

ensembldb deseq2 • 2.4k views

ADD COMMENT • link 6.9 years ago aaron.dickey • 0

score 1 · Answer 1 · 2017-06-08

Hi Aaron,

the warning message just means that the EnsDb.Rnorvegicus.v89 package is not available - as packages for the rat genome there are only EnsDb.Rnorvegicus.v75 and EnsDb.Rnorvegicus.v79 available that you could install using the biocLite function.

If you need more recent gene models, you can get the Ensembl rat data for Ensembl version 87 and 88 from AnnotationHub (I am currently building the ones for Ensembl 89, but that takes some time):

library(AnnotationHub)
ah <- AnnotationHub()

## To get the EnsDb for Rnorvegicus, Ensembl version 88:
edb <- query(ah, "EnsDb.Rnorvegicus.v88")[[1]]

## You can then use this edb for your queries
transcripts(edb)
GRanges object with 41078 ranges and 6 metadata columns:
                     seqnames             ranges strand |              tx_id
                        <Rle>          <IRanges>  <Rle> |        <character>
  ENSRNOT00000044187        1   [396700, 409676]      + | ENSRNOT00000044187
  ENSRNOT00000072186        1   [396700, 409676]      + | ENSRNOT00000072186
  ENSRNOT00000093216        1   [396840, 409750]      + | ENSRNOT00000093216
                 ...      ...                ...    ... .                ...
  ENSRNOT00000085333        Y [2653008, 2654859]      + | ENSRNOT00000085333
  ENSRNOT00000092839        Y [3181118, 3181328]      + | ENSRNOT00000092839
  ENSRNOT00000086356        Y [3253610, 3254888]      + | ENSRNOT00000086356
                               tx_biotype tx_cds_seq_start tx_cds_seq_end
                              <character>        <integer>      <integer>
  ENSRNOT00000044187 processed_transcript             <NA>           <NA>
  ENSRNOT00000072186 processed_transcript             <NA>           <NA>
  ENSRNOT00000093216 processed_transcript             <NA>           <NA>
                 ...                  ...              ...            ...
  ENSRNOT00000085333              lincRNA             <NA>           <NA>
  ENSRNOT00000092839 processed_pseudogene             <NA>           <NA>
  ENSRNOT00000086356              lincRNA             <NA>           <NA>
                                gene_id            tx_name
                            <character>        <character>
  ENSRNOT00000044187 ENSRNOG00000046319 ENSRNOT00000044187
  ENSRNOT00000072186 ENSRNOG00000046319 ENSRNOT00000072186
  ENSRNOT00000093216 ENSRNOG00000046319 ENSRNOT00000093216
                 ...                ...                ...
  ENSRNOT00000085333 ENSRNOG00000052946 ENSRNOT00000085333
  ENSRNOT00000092839 ENSRNOG00000062169 ENSRNOT00000092839
  ENSRNOT00000086356 ENSRNOG00000058415 ENSRNOT00000086356
  -------
  seqinfo: 162 sequences from Rnor_6.0 genome

score 0 · Answer 2 · 2017-06-08

0

Entering edit mode

aaron.dickey • 0

@aarondickey-13210

Last seen 6.9 years ago

Thanks Johannes,

That makes sense. It seems since the pseudomapping was done to the v89 transcriptome that the same gene model should be used. I guess more of a theoretical than a practical question of whether an earlier gene model would be appropriate. Thanks again!

ADD COMMENT • link 6.9 years ago aaron.dickey • 0

0

Entering edit mode

I am currently generating the EnsDbs for Ensembl 89 - it might take some days but then they should be available in Bioc devel's AnnotationHub.

ADD REPLY • link 6.9 years ago Johannes Rainer ★ 2.0k

0

Entering edit mode

I have just started a zebrafish project, will you be creating and EnsDbs for Ensembl v89 of Danio rerio as well, if not may I ask the best way to? Thanks

ADD REPLY • link 6.9 years ago Nicholas Owen • 0

0

Entering edit mode

I create EnsDbs for all species defined in Ensembl, this includes also Danio rerio. Once I'm done, these EnsDbs will show up in the AnnotationHub of the Bioc devel version.

Note that the Danio rerio EnsDbs for Ensembl 87 and 88 are already in AnnotationHub

ADD REPLY • link 6.9 years ago Johannes Rainer ★ 2.0k