Question: quant.sf files and tximport, transcripts not recognized
0
3 months ago by
Merlin 10
Vancouver
Merlin 10 wrote:

Hello Folks,

I generated quant.sf file with Salmon tool and the next step is to Import the transcripts abundance dataset with tximport. I generated the file.csv using the same annotation file used in salmon,

> head(tx2gene)

TXNAME            GENEID
1 ENST00000456328.2 ENSG00000223972.4
2 ENST00000515242.2 ENSG00000223972.4
3 ENST00000518655.2 ENSG00000223972.4
4 ENST00000450305.2 ENSG00000223972.4
5 ENST00000473358.1 ENSG00000243485.2
6 ENST00000469289.1 ENSG00000243485.2


Here is the output from a quant.sf file,

cat quant.sf | head -n 3
ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|processed_transcript|    1657    1513.346        0.000000        0.000
ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene|       632     488.811 17.921214       1.000


When I launch the lst script I get that:

    txi <- tximport(files, type="salmon", tx2gene=tx2gene)

1 2 3 4 5 6
Error in summarizeToGene(txi, tx2gene, varReduce, ignoreTxVersion, ignoreAfterBar,  :

None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.

Example IDs (file): [ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|processed_transcript|, ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene|, ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene|, ...]

Example IDs (tx2gene): [ENST00000456328.2, ENST00000515242.2, ENST00000518655.2, ...]

This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.


I know that this problem was faced from other people but I couldn't find the solution for my case, do you have any suggestion about what should I change?

And also I have another quesiton, why is needed to use the file.csv? at the end has only the same gene ID of my quant.sf file

Thank you

salmon tximport • 125 views
modified 3 months ago by Michael Love24k • written 3 months ago by Merlin 10
Answer: quant.sf files and tximport, transcripts not recognized
2
3 months ago by
Michael Love24k
United States
Michael Love24k wrote:

Take a closer look at the message that is printed, it has some useful information for you.

I'm not sure if it's read_tsv that is wrong since I don't have tsv file or there is something more required and related to summarizeToGene function

it says this as well,

 None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both


. but I used the same annotation...

reading here: We can avoid gene-level summarization by setting txOut=TRUE, giving the original transcript level estimates as a list of matrices

I changed my command line to

txi.salmon <- tximport(files, type="salmon", tx2gene=tx2gene, txOut=TRUE)

and I don't have error anymore but I don't know if the output that I get is correct to go to DESeq2

Can you tell me that please?

Thank you

1

hi Merlin,

Over the past couple of interactions, I feel like you're not taking the time to double check your work and read relevant messages.

It says above very clearly that the gene IDs in the file look like "ENST00000456328.2|..." while the gene IDs in the tx2gene table look like "ENST00000456328.2".

The difference is that there is a bunch of extra characters in the quantification files. The IDs need to be the same for the matching of transcripts to genes to work.

Furthermore, we have built a solution for this already, to "ignore after bar", by setting ignoreAfterBar=TRUE.

And the message that the software prints to the consolue even goes to tell you that you should try this solution and that it may solve your problem.

Please take the time to try to solve these problems on your end before immediately posting for further help from maintainers that are already busy.

Thank you for you answer Michael, Yes It’s at least three days that I m checking my work, and I have also tried to put the two messages indicated in the output but it didn’t work because I ddin’t use the complete command =TRUE. Slowly I’m learning everything

I’m sorry for taking your time, if you consider that is a low level question please don’t answer, that’s my level.

At the end it works , I appreciated

Thank you