Question: quant.sf files and tximport, transcripts not recognized
0
gravatar for Merlin
3 months ago by
Merlin 10
Vancouver
Merlin 10 wrote:

Hello Folks,

I generated quant.sf file with Salmon tool and the next step is to Import the transcripts abundance dataset with tximport. I generated the file.csv using the same annotation file used in salmon,

> head(tx2gene)

             TXNAME            GENEID
1 ENST00000456328.2 ENSG00000223972.4
2 ENST00000515242.2 ENSG00000223972.4
3 ENST00000518655.2 ENSG00000223972.4
4 ENST00000450305.2 ENSG00000223972.4
5 ENST00000473358.1 ENSG00000243485.2
6 ENST00000469289.1 ENSG00000243485.2

Here is the output from a quant.sf file,

cat quant.sf | head -n 3
Name    Length  EffectiveLength TPM     NumReads
ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|processed_transcript|    1657    1513.346        0.000000        0.000
ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene|       632     488.811 17.921214       1.000

When I launch the lst script I get that:

    txi <- tximport(files, type="salmon", tx2gene=tx2gene)

> reading in files with read_tsv
    1 2 3 4 5 6 
    Error in summarizeToGene(txi, tx2gene, varReduce, ignoreTxVersion, ignoreAfterBar,  : 

      None of the transcripts in the quantification files are present
      in the first column of tx2gene. Check to see that you are using
      the same annotation for both.

    Example IDs (file): [ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|processed_transcript|, ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene|, ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene|, ...]

    Example IDs (tx2gene): [ENST00000456328.2, ENST00000515242.2, ENST00000518655.2, ...]

      This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.

I know that this problem was faced from other people but I couldn't find the solution for my case, do you have any suggestion about what should I change?

And also I have another quesiton, why is needed to use the file.csv? at the end has only the same gene ID of my quant.sf file

Thank you

salmon tximport • 125 views
ADD COMMENTlink modified 3 months ago by Michael Love24k • written 3 months ago by Merlin 10
Answer: quant.sf files and tximport, transcripts not recognized
2
gravatar for Michael Love
3 months ago by
Michael Love24k
United States
Michael Love24k wrote:

Take a closer look at the message that is printed, it has some useful information for you.

ADD COMMENTlink written 3 months ago by Michael Love24k

I'm not sure if it's read_tsv that is wrong since I don't have tsv file or there is something more required and related to summarizeToGene function

it says this as well,

 None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both

. but I used the same annotation...

reading here: We can avoid gene-level summarization by setting txOut=TRUE, giving the original transcript level estimates as a list of matrices

I changed my command line to

txi.salmon <- tximport(files, type="salmon", tx2gene=tx2gene, txOut=TRUE)

and I don't have error anymore but I don't know if the output that I get is correct to go to DESeq2

Can you tell me that please?

Thank you

ADD REPLYlink modified 3 months ago • written 3 months ago by Merlin 10
1

hi Merlin,

Over the past couple of interactions, I feel like you're not taking the time to double check your work and read relevant messages.

It says above very clearly that the gene IDs in the file look like "ENST00000456328.2|..." while the gene IDs in the tx2gene table look like "ENST00000456328.2".

The difference is that there is a bunch of extra characters in the quantification files. The IDs need to be the same for the matching of transcripts to genes to work.

Furthermore, we have built a solution for this already, to "ignore after bar", by setting ignoreAfterBar=TRUE.

And the message that the software prints to the consolue even goes to tell you that you should try this solution and that it may solve your problem.

Please take the time to try to solve these problems on your end before immediately posting for further help from maintainers that are already busy.

ADD REPLYlink written 3 months ago by Michael Love24k

Thank you for you answer Michael, Yes It’s at least three days that I m checking my work, and I have also tried to put the two messages indicated in the output but it didn’t work because I ddin’t use the complete command =TRUE. Slowly I’m learning everything

I’m sorry for taking your time, if you consider that is a low level question please don’t answer, that’s my level.

At the end it works , I appreciated

Thank you

ADD REPLYlink modified 3 months ago • written 3 months ago by Merlin 10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 350 users visited in the last hour