Question

tximport with versioned identifiers

0

Entering edit mode

Ed Siefker ▴ 230

@ed-siefker-5136

Last seen 2.0 years ago

United States

Using tximport to load quants from salmon, with an index from
ftp://ftp.ensembl.org/pub/release-90/fasta/mus_musculus/cdna/Mus_musculus.GRCm38.cdna.all.fa.gz

quants.sf looks like:

                 Name Length EffectiveLength TPM NumReads
1 ENSMUST00000177564.1     16               7   0        0
2 ENSMUST00000196221.1      9               4   0        0
3 ENSMUST00000179664.1     11               5   0        0

Built tx2gene like this:

> tx2gene <- transcripts(EnsDb.Mmusculus.v79, columns=c("gene_name"), return.type="data.frame")[c(2,1)]

> tx2gene[sample(nrow(tx2gene), 4),]
                   tx_id gene_name
62224 ENSMUST00000124947    Mpv17l
25172 ENSMUST00000058295     Erbb2
65157 ENSMUST00000133203    Neurl4
86485 ENSMUST00000147800   Slc26a9

tximport doesn't recognized the versioned identifiers in quant.sf

> txi.salmon <- tximport(quant_files, type="salmon", tx2gene=tx2gene)
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) :

  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

What can I do? The only thing I can think of is going back to square one, running my transcriptome through sed 's/^\(>[^[:space:]]*\)\.[0-9][[:space:]]/\1 /', and running salmon again. But I doubt that's the intended workflow.

tximport ensembldb • 2.0k views

ADD COMMENT • link 8.1 years ago Ed Siefker ▴ 230

0

Entering edit mode

Ed Siefker ▴ 230

@ed-siefker-5136

Last seen 2.0 years ago

United States

Nevermind. ignoreTxVersion. Got it.

ADD COMMENT • link 8.1 years ago Ed Siefker ▴ 230

score 3 · Accepted Answer · 2017-10-25

3

Entering edit mode

Michael Love 43k

@mikelove

Last seen 10 hours ago

United States

Check out the help page ?tximport

You can use ignoreTxVersion=TRUE to chop of the transcript version from the IDs.

ADD COMMENT • link 8.1 years ago Michael Love 43k