Using tximport to load quants from salmon, with an index from
ftp://ftp.ensembl.org/pub/release-90/fasta/mus_musculus/cdna/Mus_musculus.GRCm38.cdna.all.fa.gz
quants.sf looks like:
Name Length EffectiveLength TPM NumReads 1 ENSMUST00000177564.1 16 7 0 0 2 ENSMUST00000196221.1 9 4 0 0 3 ENSMUST00000179664.1 11 5 0 0
Built tx2gene like this:
> tx2gene <- transcripts(EnsDb.Mmusculus.v79, columns=c("gene_name"), return.type="data.frame")[c(2,1)] > tx2gene[sample(nrow(tx2gene), 4),] tx_id gene_name 62224 ENSMUST00000124947 Mpv17l 25172 ENSMUST00000058295 Erbb2 65157 ENSMUST00000133203 Neurl4 86485 ENSMUST00000147800 Slc26a9
tximport doesn't recognized the versioned identifiers in quant.sf
> txi.salmon <- tximport(quant_files, type="salmon", tx2gene=tx2gene) reading in files with read_tsv 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) : None of the transcripts in the quantification files are present in the first column of tx2gene. Check to see that you are using the same annotation for both.
What can I do? The only thing I can think of is going back to square one, running my transcriptome through sed 's/^\(>[^[:space:]]*\)\.[0-9][[:space:]]/\1 /', and running salmon again. But I doubt that's the intended workflow.