Search
Question: difference among tximport scaledTPM, lengthScaledTPM and the original TPM output by salmon/kallisto
3
gravatar for tangming2005
2.3 years ago by
tangming200590
United States
tangming200590 wrote:

Hi,

I am testing salmon and kallisto for RNA-seq. both tools outputs ESTIMATED counts and TPM. I have read around and put my notes here https://github.com/crazyhottommy/RNA-seq-analysis/blob/master/salmon_kalliso_STAR_compare.md#counts-versus-tpmrpkmfpkm

My questions are:

1. from the help of tximport function:

countsFromAbundance:

 

character, either "no" (default), "scaledTPM", or "lengthScaledTPM", for whether to generate estimated counts using abundance estimates scaled up to library size (scaledTPM) or additionally scaled using the average transcript length over samples and the library size (lengthScaledTPM). if using scaledTPM or lengthScaledTPM, then the counts are no longer correlated with average transcript length, and so the length offset matrix should not be used.

To my understanding, TPM is a unit that scaled by (effective) feature length first and then sequencing depth. So, what are scaledTPM and lengthScaled TPM? does tximport use the estimate counts to get the TPM?

2. what's the difference among the TPM output by salmon/kallisto and the TPM returned by tximport function?

3. How does tximport mathematically convert counts to TPM if use the estimated counts to get the TPM?

Thanks very much!

Ming Tang

ADD COMMENTlink modified 5 months ago by luxeredias10 • written 2.3 years ago by tangming200590
10
gravatar for Michael Love
2.3 years ago by
Michael Love19k
United States
Michael Love19k wrote:

hi Ming Tang,

First, in case it's easier to just read the code which produces these counts, you can look it over the few lines of code here:

https://github.com/Bioconductor-mirror/tximport/blob/master/R/tximport.R#L371-L378

1) scaledTPM is TPM's scaled up to library size, while lengthScaledTPM first multiplies TPM by feature length and then scales up to library size. These are then quantities that are on the same scale as original counts, except no longer correlated with feature length across samples.

2) No difference. tximport is simply importing the TPMs and providing them back to the user as a matrix (txOut=TRUE), or summarizing these values among isoforms of a gene (txOut=FALSE).

3) Counts are never converted to TPMs. The default is to import the estimated counts and estimated TPMs from the quantification files, and then summarize these to the gene level.

ADD COMMENTlink written 2.3 years ago by Michael Love19k
2

Thanks Michael. I understand much better. correct me if I am wrong:

tximport function just import the estimated counts/TPM and summarize to gene-level.

tx.salmon <- tximport(salmon.files, type = "salmon", tx2gene = tx2gene, 
                      reader = read_tsv, countsFromAbundance = "no")

tx.salmon$counts will be the count table from the original salmon quantification, but gene-level summarized.

tx.salmon$abundance will be TPM table from the original salmon quantification, but gene-level summarized

Alternatively, one can generate the count table from TPM (not from the original estimated counts):

tx.salmon.scale <- tximport(salmon.files, type = "salmon", tx2gene = tx2gene, 
                      reader = read_tsv, countsFromAbundance = "lengthScaledTPM")

tx.salmon.scale$abundance will be the same as tx.salmon$abundance ( I checked)

but tx.salmon.scale$count will be generated by using the TPM value * featureLength * library size.

values of tx.salmon.scale$count are very close to tx.salmon$count, but accounted for transcript length changes across samples.

 

ADD REPLYlink written 2.3 years ago by tangming200590
3

Yes correct.

ADD REPLYlink written 2.3 years ago by Michael Love19k

@Michael Love , thank you. That was succinct; much appreciate it. Its too bad the toil project in Xena (TCGA) does not provide the library size for this conversion or just the tpm counts.  

A

ADD REPLYlink modified 14 months ago • written 14 months ago by Ahdee40
0
gravatar for luxeredias
5 months ago by
luxeredias10
Brazil - Belo Horizonte - UFMG
luxeredias10 wrote:

Dear all,

Following up with the topic, I also struggled a bit to figure out salmon+tximport outputs, but after reading forums+papers and looking into my own data I came up with this scheme of how I think RNAseq normalization of salmon+tximport data works.

https://drive.google.com/file/d/1FQJ6Ao2L9Z2CLVA5clE8DzvwHR0ulBZD/view?usp=sharing

Box patterns (full lines, interrupted lines, yellow highlighting, red line) refer to output file categoty (tx-lvl/gene-lvl, lib-size-length-corrected/not corrected, abundance/count)

Best,

Thomaz Luscher Dias

UFMG-Brazil

ADD COMMENTlink written 5 months ago by luxeredias10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 246 users visited in the last hour