Dear Bioconductor users,

I am working with TCGA RNA-seq data. I have downloaded the **rsem.genes.result** file for a specific cancer type and I've understood that the "**raw_count**" is the estimated number of fragments derived from a given gene and the "**scaled_estimate**" is the fraction of transcripts made up by a given gene. The "**scaled_estimate**" could maybe be used as well, e.g. by multiplying it with 1M to get "**transcripts per million**" (**TPM**) which Li and Dewey state should be more comparable across samples. How exactly the "scaled_estimate" counts have been computed? Have these counts been scaled for library size or both library size and transcript length? Why the "scaled_estimate" column never sums to one? Could these counts be used for differential expression analysis applying Deseq2, limma-voom, edgeR algorithms?Could these values be normalized-transformed for further analyses (unsupervised learning, supevised learning) applying limma-voom or VST tranformed counts?

Thank you very very much for your time in advance!!!

Sincerely,

Panagiotis Mokos