ensembldb salmon deseq2 txt2import right tool?
2
0
Entering edit mode
@aarondickey-13210
Last seen 6.9 years ago

I have 20 paired end libraries pseudomapped to the rat transcriptome with Salmon and I wish to create a tx2gene dataframe via the ensembldb package for downstream analysis in deseq2 as recommended here:

Note: if you are using an Ensembl transcriptome, the easiest way to create the tx2gene data.frame is to use the ensembldb packages. The annotation packages can be found by version number, and use the pattern EnsDb.Hsapiens.vXX. The transcripts function can be used with return.type="DataFrame", in order to obtain something like the df object constructed in the code chunk above. See the ensembldb package vignette for more details. 

however, while biocLite("EnsDb.Hsapiens.v75") works fine, biocLite("EnsDb.Rnorvegicus.v89") returns: Warning message: package 'EnsDb.Rnorvegicus.v89' is not available (for R version 3.4.0)

Is this a case of trying to use the wrong tool, i.e these recommendations apply to human data but not other species... or some other issue? Would BioMart help?

Thanks! Aaron 

ensembldb deseq2 • 2.4k views
ADD COMMENT
1
Entering edit mode
Johannes Rainer ★ 2.0k
@johannes-rainer-6987
Last seen 22 days ago
Italy

Hi Aaron,

the warning message just means that the EnsDb.Rnorvegicus.v89 package is not available - as packages for the rat genome there are only EnsDb.Rnorvegicus.v75 and EnsDb.Rnorvegicus.v79 available that you could install using the biocLite function.

If you need more recent gene models, you can get the Ensembl rat data for Ensembl version 87 and 88 from AnnotationHub (I am currently building the ones for Ensembl 89, but that takes some time):

library(AnnotationHub)
ah <- AnnotationHub()

## To get the EnsDb for Rnorvegicus, Ensembl version 88:
edb <- query(ah, "EnsDb.Rnorvegicus.v88")[[1]]

## You can then use this edb for your queries
transcripts(edb)
GRanges object with 41078 ranges and 6 metadata columns:
                     seqnames             ranges strand |              tx_id
                        <Rle>          <IRanges>  <Rle> |        <character>
  ENSRNOT00000044187        1   [396700, 409676]      + | ENSRNOT00000044187
  ENSRNOT00000072186        1   [396700, 409676]      + | ENSRNOT00000072186
  ENSRNOT00000093216        1   [396840, 409750]      + | ENSRNOT00000093216
                 ...      ...                ...    ... .                ...
  ENSRNOT00000085333        Y [2653008, 2654859]      + | ENSRNOT00000085333
  ENSRNOT00000092839        Y [3181118, 3181328]      + | ENSRNOT00000092839
  ENSRNOT00000086356        Y [3253610, 3254888]      + | ENSRNOT00000086356
                               tx_biotype tx_cds_seq_start tx_cds_seq_end
                              <character>        <integer>      <integer>
  ENSRNOT00000044187 processed_transcript             <NA>           <NA>
  ENSRNOT00000072186 processed_transcript             <NA>           <NA>
  ENSRNOT00000093216 processed_transcript             <NA>           <NA>
                 ...                  ...              ...            ...
  ENSRNOT00000085333              lincRNA             <NA>           <NA>
  ENSRNOT00000092839 processed_pseudogene             <NA>           <NA>
  ENSRNOT00000086356              lincRNA             <NA>           <NA>
                                gene_id            tx_name
                            <character>        <character>
  ENSRNOT00000044187 ENSRNOG00000046319 ENSRNOT00000044187
  ENSRNOT00000072186 ENSRNOG00000046319 ENSRNOT00000072186
  ENSRNOT00000093216 ENSRNOG00000046319 ENSRNOT00000093216
                 ...                ...                ...
  ENSRNOT00000085333 ENSRNOG00000052946 ENSRNOT00000085333
  ENSRNOT00000092839 ENSRNOG00000062169 ENSRNOT00000092839
  ENSRNOT00000086356 ENSRNOG00000058415 ENSRNOT00000086356
  -------
  seqinfo: 162 sequences from Rnor_6.0 genome

 

ADD COMMENT
0
Entering edit mode

Hi, I have a question, the output of kallisto's transctrip name is 

ENSMUST00000178537.1 but there is no .1 in ensembldb's output tx_name . when I do the tximport, error:
Error in summarizeToGene(txi.kallisto, tx2gene) : 
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

 

Do you know how to solve this problem? Thanks a lot for your time.

ADD REPLY
0
Entering edit mode

You have to remove the transcript version number from the transcript IDs (i.e. the .1). Just be sure that the Ensembl version of the EnsDb you are using and the version that was used for kallisto match.

A fast way to remove them is e.g. top_table$tx_id <- sub("\\.[0-9]*$", "", top_table$tx_id)

ADD REPLY
1
Entering edit mode

FYI: Johannes' solution is automagically performed within the tximport function when specifying the argument ignoreTxVersion = TRUE (default = FALSE).

ADD REPLY
0
Entering edit mode

Check the help page for ?tximport. You can ignore version numbers 

ADD REPLY
0
Entering edit mode
@aarondickey-13210
Last seen 6.9 years ago

Thanks Johannes,

That makes sense. It seems since the pseudomapping was done to the v89 transcriptome that the same gene model should be used. I guess more of a theoretical than a practical question of whether an earlier gene model would be appropriate. Thanks again!

ADD COMMENT
0
Entering edit mode

I am currently generating the EnsDbs for Ensembl 89 - it might take some days but then they should be available in Bioc devel's AnnotationHub.
 

ADD REPLY
0
Entering edit mode

I have just started a zebrafish project, will you be creating and EnsDbs for Ensembl v89 of Danio rerio as well, if not may I ask the best way to? Thanks

ADD REPLY
0
Entering edit mode

I create EnsDbs for all species defined in Ensembl, this includes also Danio rerio. Once I'm done, these EnsDbs will show up in the AnnotationHub of the Bioc devel version.

Note that the Danio rerio EnsDbs for Ensembl 87 and 88 are already in AnnotationHub

ADD REPLY

Login before adding your answer.

Traffic: 1043 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6