I am trying to use tximport on my read counts from salmon to condense the ensemble transcript ID counts to gene ID counts. This is what Ive tried:
txdb <- makeTxDbFromGFF("/path/gencode.v24.primary_assembly.annotation.gtf")
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")
tx2gene <- df[, 2:1]
1 ENST00000612152.4 ENSG00000000003.14
2 ENST00000373020.8 ENSG00000000003.14
3 ENST00000614008.4 ENSG00000000003.14
4 ENST00000496771.5 ENSG00000000003.14
5 ENST00000494424.1 ENSG00000000003.14
6 ENST00000373031.4 ENSG00000000005.5
The salmon quant.sf files dont have the decimal in the ensembl IDs so i just manually cut them off
a<- gsub("\\..*","",tx2gene[,1]) b<- gsub("\\..*","",tx2gene[,2]) c<-cbind(a,b) colnames(c)=colnames(tx2gene) tx2gene <- as.data.frame(c) head(tx2gene) TXNAME GENEID 1 ENST00000612152 ENSG00000000003 2 ENST00000373020 ENSG00000000003 3 ENST00000614008 ENSG00000000003 4 ENST00000496771 ENSG00000000003 5 ENST00000494424 ENSG00000000003 6 ENST00000373031 ENSG00000000005 library(tximport) library(readr) dir <- "/path_to_salmon_directory" samples <- read.table("/path/file_names.txt", header=FALSE) files <- file.path(dir,"salmon", samples$V1, "quant.sf") names(files) <- paste0("sample", 1:9) txi.salmon <- tximport(files, type = "salmon", tx2gene = tx2gene, reader = read_tsv) reading in files 1 2 3 4 5 6 7 8 9 Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) : None of the transcripts in the quantification files are present in the first column of tx2gene. Check to see that you are using the same annotation for both.
I have used all(file.exists(files)) to make sure the file paths are correct. Also if I use read.table to import a single quant.sf file.
quantfile=read.table(files[1]) head(quantfile) V1 V2 V3 V4 V5 1 dName Length EffectiveLength TPM NumReads 2 ENST00000373020 2206 2063.46 31.9872 1495.67 3 ENST00000494424 820 677.463 0 0 4 ENST00000496771 1025 882.463 0.973862 19.4741 5 ENST00000612152 3796 3653.46 0.73197 60.5985 6 ENST00000614008 900 757.463 0.11619 1.99431
I can intersect the first column of the quant.sf file and the first column of the tx2gene file and I get nearly 200k matches. I dont understand why the tximport function is saying there are no matches. I have also tried using EnsDb.Hsapiens.v79 package to create the tx2gene file, and I get the same error. Any help appreciated!
Is it because you are using tx2gene instead of tx3gene?
Oh sorry about that, I changed the name when I was playing with it trying to fix it. But no it is not. I have double checked just to make sure I am using the tx2gene file without the decimal. Good catch though!
Can you email save.image(file="all_objects.rda") to maintainer("tximport")
I get the same error when using Gencode fasta files with salmon, because the quant.sf output becomes:
Editing Name and keeping only the first part solves the problem.
See the 'ignoreTxVersion' argument to tximport() which may help in your case.
How do you modify the Name? Any script? I tried ignoreTxVersion = TRUE but it didn't work.
I'll need a lot more details about what you're trying to do and what didn't work. You could make a new post and include also what code you are trying to use, etc.