This question is in reference to the recently updated TxImport vignette: https://bioconductor.org/packages/3.7/bioc/vignettes/tximport/inst/doc/tximport.html
I saw a related question from a few months back but wasn't able to follow the solution and am posting again, thanks in advance.
I ran Salmon on my fastq files from a recent mouse RNAseq experiment using the Transcript Sequences CHR Fasta file from the current Gencode Mouse release (https://www.gencodegenes.org/releases/current.html / ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.transcripts.fa.gz). Everything seemed to go smoothly and so now I am in the process of making a tx2gene table to do my feature counting with tximport.
I note in your example that you import a constructed tx2gene .csv table from a recent gencode release.
library(readr) tx2gene <- read_csv(file.path(dir, "tx2gene.gencode.v27.csv")) head(tx2gene)
I'm trying to get to this point with the gencode.vM16.annotation.gtf file I downloaded from the most recent release, but am not completely following how to go about this. Would you mind spelling this out for me a bit more? This is what I've done and want to make sure I haven't missed anything:
> TxDb <- makeTxDbFromGFF(file = "/Path/to/file/gencode.vM16.annotation.gtf")
> k <- keys(TxDb, keytype = "TXNAME")
> tx2gene <- select(TxDb, k, "GENEID", "TXNAME")
1 ENSMUST00000193812.1 ENSMUSG00000102693.1
2 ENSMUST00000082908.1 ENSMUSG00000064842.1
3 ENSMUST00000192857.1 ENSMUSG00000102851.1
4 ENSMUST00000161581.1 ENSMUSG00000089699.1
5 ENSMUST00000192183.1 ENSMUSG00000103147.1
6 ENSMUST00000193244.1 ENSMUSG00000102348.
I wasn't sure if this accomplished the same thing as indexing the GTF to a CSV and then reading that in, because of the error I get when running tximport:
> sample_list <- read.delim("/Path/to/sample/list/20180214_fastq_ID_list copy.txt", sep = "\t", header = F)
> files <- file.path("/Path/to/Quantsoutputfolder/SalmonQuants",sample_list[,"V1"], "quant.sf")
When I next tried to run tximport I got an error saying:
> txi <- tximport(files, type = "salmon", tx2gene = tx2gene)
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) :
None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.
So since I know I'm using the same annotation, I figure that somewhere along the way in making the tx2gene table I've done something wrong. I'm not sure the exact steps from the vignette and if I should be trying to use the ensembldb package. I were to use EnsemblDB would loading the most recent EnsDb.MMusculus.v79 accomplish the same goal after making a data frame including the Gene IDs?
Thanks a lot, really appreciate the help!
As an aside, when i first tried to run tximport I got the following error. I installed biocLite("rjson") and then ran the samples again. I only have biological replicates in my data files, so I am a little confused by the expectation that I have inferential replicates.
reading in files with read_tsv
1 Error in readInfRepFish(x, type) :
importing inferential replicates for Salmon or Sailfish requires package `rjson`.
to skip this step, set dropInfReps=TRUE
Also, as another aside, from my reading it sounds like Gencode and Ensembl are supposed to be the same annotation, but I tried to run Salmon on my data in parallel using the most recent Ensembl mouse transcriptome and got a much lower number of mappings that way. Anyone have an idea why that might be?