tximport command error
1
0
Entering edit mode
eliztran • 0
@eliztran-20727
Last seen 4.9 years ago

I'm pretty new to RNAseq, and I have been following this tutorial to get my data RNAseq data DESeq2 compatible (https://bioconductor.github.io/BiocWorkshops/rna-seq-data-analysis-with-deseq2.html). I get my sf file via Salmon. However, I'm getting this error once I try to call the tximport command:

txi command:

txi = tximport(files, type="salmon", tx2gene=tx2gene, ignoreTxVersion= TRUE, ignoreAfterBar=TRUE) 

error message:

"Error in summarizeToGene(txi, tx2gene, varReduce, ignoreTxVersion, ignoreAfterBar, : None of the transcripts in the quantification files are present in the first column of tx2gene. Check to see that you are using the same annotation for both. Example IDs (file): [ebi, ...] Example IDs (tx2gene): [ENST00000456328.2, ENST00000450305.2, ENST00000473358.1, ...] This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'."

I've looked at previously answered questions, and the most common suggestion was to put ignoreTxVersion=TRUE or ignoreAfterBar = TRUE. I have tried both, but I am still getting the same error. Not too sure what to do. Any suggestions?

tx2gene txi error • 2.4k views
ADD COMMENT
0
Entering edit mode

You need to show the results from head(tx2gene) as well as readr::read_tsv(files[1]). Which may already point out to you what the problem is.

ADD REPLY
0
Entering edit mode

Sorry about that!

head(tx2gene) shows:

# A tibble: 6 x 2
  TXNAME            GENEID           
  <chr>             <chr>            
1 ENST00000456328.2 ENSG00000223972.5
2 ENST00000450305.2 ENSG00000223972.5
3 ENST00000473358.1 ENSG00000243485.5
4 ENST00000469289.1 ENSG00000243485.5
5 ENST00000607096.1 ENSG00000284332.1
6 ENST00000606857.1 ENSG00000268020.3

and readr::read_tsv(files[1]) shows

Parsed with column specification:
cols(
  Name = col_character(),
  Length = col_double(),
  EffectiveLength = col_double(),
  TPM = col_double(),
  NumReads = col_double()
)
# A tibble: 1 x 5
  Name          Length EffectiveLength   TPM NumReads
  <chr>          <dbl>           <dbl> <dbl>    <dbl>
1 ebi.ac.uk 1203996941      1203996928     0        0
ADD REPLY
0
Entering edit mode

Your transcript names in the file are not ENST...

ADD REPLY
0
Entering edit mode

sorry I'm a bit confused. How would I go about this?

ADD REPLY
0
Entering edit mode

Is there someone you can collaborate with, who is familiar with RNA-seq pipelines? Of course there are many tutorials online but it seems you are stuck at an early stage and you’d benefit from having someone looking over your shoulder.

ADD REPLY
0
Entering edit mode

everyone in my lab is currently out, so I'm trying to work around that, but thank you for taking your time out to respond! I appreciate it!

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

Well, your first file has one row, and the chromosome in that file is called 'ebi.ac.uk`! Which obviously makes no sense. Here is an example of what you should see (note that this some mouse data"

Parsed with column specification:
cols(
  Name = col_character(),
  Length = col_double(),
  EffectiveLength = col_double(),
  TPM = col_double(),
  NumReads = col_double()
)
# A tibble: 107,188 x 5
   Name           Length EffectiveLength      TPM  NumReads
   <chr>           <dbl>           <dbl>    <dbl>     <dbl>
 1 NM_001001130.2   2218           1969.  3.73      65     
 2 NM_001001144.3   4226           4571. 31.2     1264.    
 3 NM_001001152.2   3488           2945.  1.07      27.9   
 4 NM_001001160.3   6688           6846.  0.222     13.4   
 5 NM_001001176.2   2602           2187.  3.72      72.1   
 6 NM_001001177.2   1900            169   0          0     
 7 NM_001001178.1   3992           3994.  0.0740     2.62  
 8 NM_001001179.3   4698           5362.  0.00203    0.0963
 9 NM_001001180.2   3909           4113.  0.456     16.6   
10 NM_001001181.3   1082            785. 92.3      642.    
#   with 107,178 more rows

And my tx2gene has things like NM_001001130.2 in the first column and whatever Gene ID that corresponds to in the second column.

So it looks like something went sideways when you ran salmon, because you should have tens of thousands of rows, not one.

ADD COMMENT
0
Entering edit mode

Thank you! I'll go back and see what happened there.

ADD REPLY

Login before adding your answer.

Traffic: 533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6