Hi,
I am unable to follow up the tutorial at https://github.com/mikelove/tximport/blob/master/vignettes/tximport.Rmd
I am getting:
> library(readr) tx2gene <- read_csv(file.path(dir, "tx2gene.gencode.v27.csv"))
Error: '/home/mmokrejs/R/x86_64-pc-linux-gnu-library/3.4/tximportData/extdata/tx2gene.gencode.v27.csv' does not exist.
Obviously I do not have the file but where does the tutorial show how to create it?
My aim is to create similar conversion file for ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.transcripts.fa.gz named tx2gene.gencode.v28.csv and continue analysis quant.sf file from Salmon.
I inferred from Constructing tx2gene for Salmon txImport Quantification using Gencode Mouse Transcript Annotation that I should better use Gencode-based transcriptome than the redundant-one from ENSEMBL. Indeed, I remember seeing warnings during `salmon index` step about duplicate entries.
Thank you for improving the tutorial at the vignette. Ideally also these two which I read but failed to understand/follow-up:
https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html
Hi Michael,
thank you for your kind answer and I am sorry for the delay in my response, I thought I will receive an email if someone replies.
Indeed, I have tximport-1.2.0 installed on the server:
Neither 'update.packages()' nor 'old.packages()' do anything. I dropped /usr/lib64/R/ and will reinstall from scratch. Nevertheless, you did not answer how do I get the 'tx2gene.gencode.v28.csv' file?
The current vignette is using a confusing trick 'names(files) <- paste0("sample", 1:6)', why doesn't it show how to use the samplenames from the samples.txt file? What is the point here? Would there be more comments around I would understand what this line does but currently one is scratching a head to figure out what this example usage of tximport really shows.
To create a tx2gene table you need a TxDb object and then you run the keys() and select() command using the code in the tximport vignette. You can make the TxDb from the Gencode GTF file using makeTxDbFromGFF() which is a function in the GenomicFeatures package.
You can name ‘files’ however you like using names(files) <- ...
You can put any character vector in place of ... in the above line of code.
So here is what a friend helped me to stitch together:
Somehow, the below does not work properly because " are being escaped by \ sign during import.
Therefore, I cannot use the tx2gene.gencode.v28.csv but I can always create tx2gene from scratch. ;)
If you happen to be using Salmon, we have a new package that will do all of this automatically for you! Gencode 28 will be loaded automatically. Would you like to try it out?
Hi @MichaelLove, would it also work with Kallisto or only Salmon and Sailfish?
Thank you for your vignettes, really useful for new ones on the topics! :D
For kallisto you should build a tx2gene table, but this should be fairly easy if you know the reference you used as the index. Follow the instructions in the tximport vignette: download the v28 GTF file from Gencode, then bring this into R and use select() to create a data.frame.
My bad, forgot to mention the main interest, is this possible in other organisms? as my focus at the moment is Zebrafish. Thank you for your fast response, Bests!