Question

tximport command error

0

Entering edit mode

eliztran • 0

@eliztran-20727

Last seen 6.7 years ago

I'm pretty new to RNAseq, and I have been following this tutorial to get my data RNAseq data DESeq2 compatible (https://bioconductor.github.io/BiocWorkshops/rna-seq-data-analysis-with-deseq2.html). I get my sf file via Salmon. However, I'm getting this error once I try to call the tximport command:

txi command:

txi = tximport(files, type="salmon", tx2gene=tx2gene, ignoreTxVersion= TRUE, ignoreAfterBar=TRUE)

error message:

"Error in summarizeToGene(txi, tx2gene, varReduce, ignoreTxVersion, ignoreAfterBar, : None of the transcripts in the quantification files are present in the first column of tx2gene. Check to see that you are using the same annotation for both. Example IDs (file): [ebi, ...] Example IDs (tx2gene): [ENST00000456328.2, ENST00000450305.2, ENST00000473358.1, ...] This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'."

I've looked at previously answered questions, and the most common suggestion was to put ignoreTxVersion=TRUE or ignoreAfterBar = TRUE. I have tried both, but I am still getting the same error. Not too sure what to do. Any suggestions?

tx2gene txi error • 3.8k views

ADD COMMENT • link updated 6.7 years ago by James W. MacDonald 68k • written 6.7 years ago by eliztran • 0

0

Entering edit mode

You need to show the results from head(tx2gene) as well as readr::read_tsv(files[1]). Which may already point out to you what the problem is.

ADD REPLY • link 6.7 years ago James W. MacDonald 68k

0

Entering edit mode

Sorry about that!

head(tx2gene) shows:

# A tibble: 6 x 2
  TXNAME            GENEID           
  <chr>             <chr>            
1 ENST00000456328.2 ENSG00000223972.5
2 ENST00000450305.2 ENSG00000223972.5
3 ENST00000473358.1 ENSG00000243485.5
4 ENST00000469289.1 ENSG00000243485.5
5 ENST00000607096.1 ENSG00000284332.1
6 ENST00000606857.1 ENSG00000268020.3

and readr::read_tsv(files[1]) shows

Parsed with column specification:
cols(
  Name = col_character(),
  Length = col_double(),
  EffectiveLength = col_double(),
  TPM = col_double(),
  NumReads = col_double()
)
# A tibble: 1 x 5
  Name          Length EffectiveLength   TPM NumReads
  <chr>          <dbl>           <dbl> <dbl>    <dbl>
1 ebi.ac.uk 1203996941      1203996928     0        0

ADD REPLY • link updated 6.7 years ago by James W. MacDonald 68k • written 6.7 years ago by eliztran • 0

0

Entering edit mode

Your transcript names in the file are not ENST...

ADD REPLY • link 6.7 years ago Michael Love 43k

0

Entering edit mode

sorry I'm a bit confused. How would I go about this?

ADD REPLY • link 6.7 years ago eliztran • 0

0

Entering edit mode

Is there someone you can collaborate with, who is familiar with RNA-seq pipelines? Of course there are many tutorials online but it seems you are stuck at an early stage and you’d benefit from having someone looking over your shoulder.

ADD REPLY • link 6.7 years ago Michael Love 43k

0

Entering edit mode

everyone in my lab is currently out, so I'm trying to work around that, but thank you for taking your time out to respond! I appreciate it!

ADD REPLY • link 6.7 years ago eliztran • 0

score 0 · Answer 1 · 2019-05-07

Well, your first file has one row, and the chromosome in that file is called 'ebi.ac.uk`! Which obviously makes no sense. Here is an example of what you should see (note that this some mouse data"

Parsed with column specification:
cols(
  Name = col_character(),
  Length = col_double(),
  EffectiveLength = col_double(),
  TPM = col_double(),
  NumReads = col_double()
)
# A tibble: 107,188 x 5
   Name           Length EffectiveLength      TPM  NumReads
   <chr>           <dbl>           <dbl>    <dbl>     <dbl>
 1 NM_001001130.2   2218           1969.  3.73      65     
 2 NM_001001144.3   4226           4571. 31.2     1264.    
 3 NM_001001152.2   3488           2945.  1.07      27.9   
 4 NM_001001160.3   6688           6846.  0.222     13.4   
 5 NM_001001176.2   2602           2187.  3.72      72.1   
 6 NM_001001177.2   1900            169   0          0     
 7 NM_001001178.1   3992           3994.  0.0740     2.62  
 8 NM_001001179.3   4698           5362.  0.00203    0.0963
 9 NM_001001180.2   3909           4113.  0.456     16.6   
10 NM_001001181.3   1082            785. 92.3      642.    
#   with 107,178 more rows

And my tx2gene has things like NM_001001130.2 in the first column and whatever Gene ID that corresponds to in the second column.

So it looks like something went sideways when you ran salmon, because you should have tens of thousands of rows, not one.