How to create tx2gene.gencode.v27.csv
0
Entering edit mode
mmokrejs • 0
@mmokrejs-16289
Last seen 2.1 years ago

Hi,

I am unable to follow up the tutorial at https://github.com/mikelove/tximport/blob/master/vignettes/tximport.Rmd

I am getting:

> library(readr)

Error: '/home/mmokrejs/R/x86_64-pc-linux-gnu-library/3.4/tximportData/extdata/tx2gene.gencode.v27.csv' does not exist.

Obviously I do not have the file but where does the tutorial show how to create it?

My aim is to create similar conversion file for ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.transcripts.fa.gz named tx2gene.gencode.v28.csv and continue analysis quant.sf file from Salmon.

I inferred from Constructing tx2gene for Salmon txImport Quantification using Gencode Mouse Transcript Annotation that I should better use Gencode-based transcriptome than the redundant-one from ENSEMBL. Indeed, I remember seeing warnings during salmon index step about duplicate entries.

Thank you for improving the tutorial at the vignette. Ideally also these two which I read but failed to understand/follow-up:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html
tximportdata • 5.5k views
0
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

I think you are reading the current tximportData vignette from the website, but you have an older version of the package on your computer.

If you are using an older version of R and Bioconductor, you should use this to make sure you follow the correct vignette paired with your package versions:

vignette("tximportData")
0
Entering edit mode

Hi Michael,

thank you for your kind answer and I am sorry for the delay in my response, I thought I will receive an email if someone replies.

Indeed, I have tximport-1.2.0 installed on the server:

\$ R
R version 3.5.1 (2018-07-02) -- "Feather Spray" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
> source("https://bioconductor.org/biocLite.R")
Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
> biocLite("tximport") BioC_mirror: https://bioconductor.org Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.5.1 (2018-07-02).
Installing package(s) 'tximport'
trying URL 'https://bioconductor.org/packages/3.4/bioc/src/contrib/tximport_1.2.0.tar.gz'
Content type 'application/x-gzip' length 22342 bytes (21 KB) ================================================== downloaded 21 KB
* installing *source* package 'tximport' ...
** R ** inst ** byte-compile and prepare package for lazy loading
** help *** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (tximport)
Updating HTML index of packages in '.Library' Making 'packages.html' ... done
>

Neither 'update.packages()' nor 'old.packages()' do anything. I dropped /usr/lib64/R/ and will reinstall from scratch.   Nevertheless, you did not answer how do I get the 'tx2gene.gencode.v28.csv' file?

The current vignette is using a confusing trick 'names(files) <- paste0("sample", 1:6)', why doesn't it show how to use the samplenames from the samples.txt file? What is the point here? Would there be more comments around I would understand what this line does but currently one is scratching a head to figure out what this example usage of tximport really shows.

0
Entering edit mode

To create a tx2gene table you need a TxDb object and then you run the keys() and select() command using the code in the tximport vignette. You can make the TxDb from the Gencode GTF file using makeTxDbFromGFF() which is a function in the GenomicFeatures package.

You can name ‘files’ however you like using names(files) <- ...

You can put any character vector in place of ... in the above line of code.

0
Entering edit mode

So here is what a friend helped me to stitch together:

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gtf.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gff3.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.transcripts.fa.gz

> library(tximportData)
> library(GenomicFeatures)
> txdb <- makeTxDbFromGFF(file="gencode.v28.annotation.gff3.gz")
> txdb
> saveDb(x=txdb, file = "gencode.v28.annotation.TxDb")

> k <- keys(txdb, keytype = "TXNAME")
> tx2gene <- select(txdb, k, "GENEID", "TXNAME")
> dim(tx2gene)
> length(k)
> write.table(tx2gene, "tx2gene.gencode.v28.csv", sep = "\t", row.names = FALSE)


Somehow, the below does not work properly because " are being escaped by \ sign during import.

> library(readr)
> dim(tx2gene)


Therefore, I cannot use the tx2gene.gencode.v28.csv but I can always create tx2gene from scratch. ;)

0
Entering edit mode

If you happen to be using Salmon, we have a new package that will do all of this automatically for you! Gencode 28 will be loaded automatically. Would you like to try it out?

BiocManager::install("tximeta")
library(tximeta)
coldata <- data.frame(files, names, condition)
se <- tximeta(coldata, type="salmon")
0
Entering edit mode

Hi @MichaelLove, would it also work with Kallisto or only Salmon and Sailfish?

Thank you for your vignettes, really useful for new ones on the topics! :D

1
Entering edit mode

For kallisto you should build a tx2gene table, but this should be fairly easy if you know the reference you used as the index. Follow the instructions in the tximport vignette: download the v28 GTF file from Gencode, then bring this into R and use select() to create a data.frame.

0
Entering edit mode

My bad, forgot to mention the main interest, is this possible in other organisms?  as my focus at the moment is Zebrafish. Thank you for your fast response, Bests!