Search
Question: How to create tx2gene.gencode.v27.csv
0
gravatar for mmokrejs
5 months ago by
mmokrejs0
mmokrejs0 wrote:

Hi,

   I am unable to follow up the tutorial at https://github.com/mikelove/tximport/blob/master/vignettes/tximport.Rmd

I am getting:

> library(readr)
tx2gene <- read_csv(file.path(dir, "tx2gene.gencode.v27.csv"))
Error: '/home/mmokrejs/R/x86_64-pc-linux-gnu-library/3.4/tximportData/extdata/tx2gene.gencode.v27.csv' does not exist.

 

Obviously I do not have the file but where does the tutorial show how to create it?

 

My aim is to create similar conversion file for ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.transcripts.fa.gz named tx2gene.gencode.v28.csv and continue analysis quant.sf file from Salmon.

I inferred from Constructing tx2gene for Salmon txImport Quantification using Gencode Mouse Transcript Annotation that I should better use Gencode-based transcriptome than the redundant-one from ENSEMBL. Indeed, I remember seeing warnings during `salmon index` step about duplicate entries.

 

Thank you for improving the tutorial at the vignette. Ideally also these two which I read but failed to understand/follow-up:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html
ADD COMMENTlink modified 5 months ago by Michael Love20k • written 5 months ago by mmokrejs0
0
gravatar for Michael Love
5 months ago by
Michael Love20k
United States
Michael Love20k wrote:

I think you are reading the current tximportData vignette from the website, but you have an older version of the package on your computer.

If you are using an older version of R and Bioconductor, you should use this to make sure you follow the correct vignette paired with your package versions:

vignette("tximportData")
ADD COMMENTlink written 5 months ago by Michael Love20k

Hi Michael,

thank you for your kind answer and I am sorry for the delay in my response, I thought I will receive an email if someone replies.

Indeed, I have tximport-1.2.0 installed on the server:

$ R
R version 3.5.1 (2018-07-02) -- "Feather Spray" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
> source("https://bioconductor.org/biocLite.R")
Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
> biocLite("tximport") BioC_mirror: https://bioconductor.org Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.5.1 (2018-07-02).
Installing package(s) 'tximport'
trying URL 'https://bioconductor.org/packages/3.4/bioc/src/contrib/tximport_1.2.0.tar.gz'
Content type 'application/x-gzip' length 22342 bytes (21 KB) ================================================== downloaded 21 KB
* installing *source* package 'tximport' ...
** R ** inst ** byte-compile and prepare package for lazy loading
** help *** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (tximport)
The downloaded source packages are in '/tmp/Rtmp7iXnYY/downloaded_packages'
Updating HTML index of packages in '.Library' Making 'packages.html' ... done
>

Neither 'update.packages()' nor 'old.packages()' do anything. I dropped /usr/lib64/R/ and will reinstall from scratch.   Nevertheless, you did not answer how do I get the 'tx2gene.gencode.v28.csv' file?

The current vignette is using a confusing trick 'names(files) <- paste0("sample", 1:6)', why doesn't it show how to use the samplenames from the samples.txt file? What is the point here? Would there be more comments around I would understand what this line does but currently one is scratching a head to figure out what this example usage of tximport really shows.

ADD REPLYlink modified 3 months ago • written 3 months ago by mmokrejs0

To create a tx2gene table you need a TxDb object and then you run the keys() and select() command using the code in the tximport vignette. You can make the TxDb from the Gencode GTF file using makeTxDbFromGFF() which is a function in the GenomicFeatures package.

You can name ‘files’ however you like using names(files) <- ...

You can put any character vector in place of ... in the above line of code.

 

ADD REPLYlink modified 3 months ago • written 3 months ago by Michael Love20k

So here is what a friend helped me to stitch together:

 

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gtf.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gff3.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.transcripts.fa.gz

> library(tximportData)
> library(GenomicFeatures)
> txdb <- makeTxDbFromGFF(file="gencode.v28.annotation.gff3.gz")
> txdb
> saveDb(x=txdb, file = "gencode.v28.annotation.TxDb")

> k <- keys(txdb, keytype = "TXNAME")
> tx2gene <- select(txdb, k, "GENEID", "TXNAME")
> head(k)
> head(tx2gene)
> dim(tx2gene)
> length(k)
> write.table(tx2gene, "tx2gene.gencode.v28.csv", sep = "\t", row.names = FALSE)

Somehow, the below does not work properly because " are being escaped by \ sign during import.

> library(readr)
> tx2gene <- read_csv(file.path("tx2gene.gencode.v28.csv"))
> head(tx2gene)
> dim(tx2gene)

Therefore, I cannot use the tx2gene.gencode.v28.csv but I can always create tx2gene from scratch. ;)

ADD REPLYlink modified 19 days ago • written 19 days ago by mmokrejs0

If you happen to be using Salmon, we have a new package that will do all of this automatically for you! Gencode 28 will be loaded automatically. Would you like to try it out?

BiocManager::install("tximeta")
library(tximeta)
coldata <- data.frame(files, names, condition)
se <- tximeta(coldata, type="salmon")
ADD REPLYlink written 19 days ago by Michael Love20k

Hi @MichaelLove, would it also work with Kallisto or only Salmon and Sailfish?

Thank you for your vignettes, really useful for new ones on the topics! :D

ADD REPLYlink written 12 days ago by MasMarius0
1

For kallisto you should build a tx2gene table, but this should be fairly easy if you know the reference you used as the index. Follow the instructions in the tximport vignette: download the v28 GTF file from Gencode, then bring this into R and use select() to create a data.frame.

ADD REPLYlink written 11 days ago by Michael Love20k

My bad, forgot to mention the main interest, is this possible in other organisms?  as my focus at the moment is Zebrafish. Thank you for your fast response, Bests!

 

ADD REPLYlink modified 11 days ago • written 11 days ago by MasMarius0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 410 users visited in the last hour