I am importing Salmon output with
tximport to get gene-level expression levels using the suggested approach:
txi = tximport(quant_files, type = "salmon", tx2gene = tx2gene, countsFromAbundance = "lengthScaledTPM")
Generally, this works as expected. However, sometimes I notice a large discrepancy between TPMs (
txi$abundance) and counts/lengthScaledTPMs (
For example, I am looking at one sample where the TPMs for the top two genes are 676,935 and 54,165 (this is clearly a problematic library), but the top two counts/lengthScaledTPMs are 661,979 and 3,917. Top 10 genes are 83% of total for TPMs, but 99.7% for counts. 97% of the genes end up with counts between 0 and 0.01. For comparison, in the original Salmon estimates, top 10 are 33% of the total. I am confident in the
tximport results, but what would cause such behavior? Can I trust any of the values?