Search
Question: Trouble with Tximport
3
gravatar for smithng1215
2.4 years ago by
smithng121530
smithng121530 wrote:

I am trying to use tximport on my read counts from salmon to condense the ensemble transcript ID counts to gene ID counts.  This is what Ive tried:


library(GenomicFeatures)
txdb <- makeTxDbFromGFF("/path/gencode.v24.primary_assembly.annotation.gtf")
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")
tx2gene <- df[, 2:1]
head(tx2gene)
             TXNAME             GENEID
1 ENST00000612152.4 ENSG00000000003.14
2 ENST00000373020.8 ENSG00000000003.14
3 ENST00000614008.4 ENSG00000000003.14
4 ENST00000496771.5 ENSG00000000003.14
5 ENST00000494424.1 ENSG00000000003.14
6 ENST00000373031.4  ENSG00000000005.5

The salmon quant.sf files dont have the decimal in the ensembl IDs so i just manually cut them off

a<- gsub("\\..*","",tx2gene[,1])
b<- gsub("\\..*","",tx2gene[,2])
c<-cbind(a,b)
colnames(c)=colnames(tx2gene)
tx2gene <- as.data.frame(c)
head(tx2gene)
           TXNAME          GENEID
1 ENST00000612152 ENSG00000000003
2 ENST00000373020 ENSG00000000003
3 ENST00000614008 ENSG00000000003
4 ENST00000496771 ENSG00000000003
5 ENST00000494424 ENSG00000000003
6 ENST00000373031 ENSG00000000005

library(tximport)
library(readr)
dir <- "/path_to_salmon_directory"
samples <- read.table("/path/file_names.txt", header=FALSE)
files <- file.path(dir,"salmon", samples$V1, "quant.sf")
names(files) <- paste0("sample", 1:9)
txi.salmon <- tximport(files, type = "salmon", tx2gene = tx2gene, reader = read_tsv)

reading in files
1 2 3 4 5 6 7 8 9 
Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) : 
  
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

I have used all(file.exists(files)) to make sure the file paths are correct.  Also if I use read.table to import a single quant.sf file.  

quantfile=read.table(files[1])
head(quantfile)
               V1     V2              V3       V4       V5
1           dName Length EffectiveLength      TPM NumReads
2 ENST00000373020   2206         2063.46  31.9872  1495.67
3 ENST00000494424    820         677.463        0        0
4 ENST00000496771   1025         882.463 0.973862  19.4741
5 ENST00000612152   3796         3653.46  0.73197  60.5985
6 ENST00000614008    900         757.463  0.11619  1.99431

I can intersect the first column of the quant.sf file and the first column of the tx2gene file and I get nearly 200k matches.  I dont understand why the tximport function is saying there are no matches.  I have also tried using EnsDb.Hsapiens.v79 package to create the tx2gene file, and I get the same error. Any help appreciated!

ADD COMMENTlink modified 2.4 years ago by Michael Love19k • written 2.4 years ago by smithng121530

Is it because you are using tx2gene instead of tx3gene?

ADD REPLYlink written 2.4 years ago by James W. MacDonald47k

Oh sorry about that, I changed the name when I was playing with it trying to fix it. But no it is not. I have double checked just to make sure I am using the tx2gene file without the decimal.  Good catch though!

ADD REPLYlink written 2.4 years ago by smithng121530

Can you email save.image(file="all_objects.rda") to maintainer("tximport")

ADD REPLYlink written 2.4 years ago by Michael Love19k

I get the same error when using Gencode fasta files with salmon, because the quant.sf output becomes:

Name    Length    EffectiveLength    TPM    NumReads
ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-002|DDX11L1|1657|processed_transcript|    1657    1456.57    0    0

Editing Name and keeping only the first part solves the problem.

 

ADD REPLYlink written 23 months ago by Peter0
1

See the 'ignoreTxVersion' argument to tximport() which may help in your case.

ADD REPLYlink written 23 months ago by Michael Love19k

How do you modify the Name? Any script? I tried ignoreTxVersion = TRUE but it didn't work.

ADD REPLYlink modified 21 months ago • written 21 months ago by garyhokawai0

hi, 

I'll need a lot more details about what you're trying to do and what didn't work. You could make a new post and include also what code you are trying to use, etc.

 

ADD REPLYlink written 21 months ago by Michael Love19k
1
gravatar for Michael Love
2.4 years ago by
Michael Love19k
United States
Michael Love19k wrote:

It seems the problem is that you may have accidentally added a 'd' to the Salmon header. Because tximport supports import across different versions of Salmon, we can't assume to know which column is which without the names, and so it's important that the name of that column is actually "Name" and not "dName". Can you confirm this is a fix on your end?

ADD COMMENTlink written 2.4 years ago by Michael Love19k

Yes that worked. Thank you very much for your help! I had a feeling it was going to be something small like that.

ADD REPLYlink written 2.4 years ago by smithng121530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 326 users visited in the last hour