Question

Regarding transcripts to gene ID conversion in tximport

0

Entering edit mode

Nikolay Ivanov • 0

@nikolay-ivanov-23079

Last seen 2.8 years ago

USA/New York City/Weill Cornell Medicine

I am performing transcript quantification with Salmon, with subsequent differential expression analysis with DESeq2.

In accordance with the Salmon documentation (https://salmon.readthedocs.io/en/latest/salmon.html), I utilized a pre-built salmon transcriptome index, which I downloaded from refgenie (hg38/salmon_sa_index) - http://refgenomes.databio.org/ (also see screenshot).

Now, my question is as follows: when I import transcript-level estimates with tximport, should I use the TxDb.Hsapiens.UCSC.hg38.knownGene package or the EnsDb.Hsapiens.v86 package to make the tx2gene argument?

Given that the description on refgenie for the hg38 genome is as follows - "The GCA_000001405.15 GRCh38_no_alt_analysis_set from NCBI" (see screenshot), I assume the transcriptome I used was based on USCS annotation, so I assume I should use TxDb.Hsapiens.UCSC.hg38.knownGene. Is that correct?

refgenie_screenshot

library(EnsDb.Hsapiens.v86)
edb = EnsDb.Hsapiens.v86
tx = as.data.frame(transcripts(edb, columns = c("tx_name", "gene_id", "gene_name"), return.type="DataFrame"))
tx2gene = tx[, c(1,2)]

#OR#

library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb = TxDb.Hsapiens.UCSC.hg38.knownGene
k = keys(txdb, keytype = "TXNAME")
tx2gene = select(txdb, k, "GENEID", "TXNAME")

# library(tximport)
# txi = tximport(files, type = "salmon", tx2gene = tx2gene, ignoreTxVersion=T)

Thank you!

salmon tximport DESeq2 • 2.1k views

ADD COMMENT • link 3.2 years ago Nikolay Ivanov • 0

score 1 · Answer 1 · 2021-02-24

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 2 hours ago

United States

When I import transcript-level estimates with tximport, should I use the TxDb.Hsapiens.UCSC.hg38.knownGene package or the EnsDb.Hsapiens.v86 package to make the tx2gene argument?

This is the purpose of the tximeta package: to help resolve this for standard reference transcriptomes for human and mouse.

Can you try:

coldata <- data.frame(files, names)
se <- tximeta(coldata)

Then you can use summarizeToGene and it will build the correct table for you.

ADD COMMENT • link 3.2 years ago Michael Love 41k

0

Entering edit mode

Fantastic, thank you, it's very convenient! Per tximeta output, the matching transcriptome was Ensembl - Homo sapiens - release 97.

ADD REPLY • link 3.2 years ago Nikolay Ivanov • 0