Question: DESeq2: Is it possible to convert read counts to expression values via TPM and return these values?
gravatar for tombernardryan
14 months ago by
tombernardryan20 wrote:


Dear all, I am so confused, I would really appreciate help. 

I have a table of read counts from RNASeq data (i.e. just a table, where each column is a sample, and each row is a gene, and the cells are read counts that range from 0 to say 10,000). I want to convert these to TPM values and output a matrix/table of TPM gene expression values (each row still a gene name, each column still a sample name).

I was advised that DESeq2 could do this. I was reading the manual: I thought section 2.4.3 "Starting from count tables" would have all the information I needed. However, the example seems to jump straight from reading in count data, to producing differentially expressed genes. I've found this to be the case with a few places I've read, and I can't seem to find a simple answer anywhere.

Is it possible to input just a table of read counts, and output a table of TMP gene expression values. 

The code would be like:


count_data <-read.table("MatrixOfReadCount",header=T)

And then I was trying to do something like:

countData <-assay(read_table) (but I was getting errors: Unable to find an inherited method for function ‘assay’ for signature ‘"data.frame", "missing"’)


ddsFullCountTable <-DESeqDataSetFromMatrix(countData=read_table)

(I also have other examples of what I tried, but there were all pointless).

I'm just so confused. I've also read other places and forums, and I cannot find how to simply "read in a matrix of read counts (row names = genes, column names = sample), and output a matrix of TPM gene expression values (row names = genes, column names = sample), without doing any differential expression analysis".

If someone could help in any way, I would appreciate it. Can this be done?

ADD COMMENTlink modified 14 months ago by Michael Love16k • written 14 months ago by tombernardryan20
gravatar for Michael Love
14 months ago by
Michael Love16k
United States
Michael Love16k wrote:

If you had access to the FASTQ files, the best way to estimate gene-level TPM in my opinion would be to use fast, lightweight transcript abundance quantifiers like Salmon, Sailfish or kallisto (or RSEM, which is also accurate but not as fast as these other three). These are my preferred way to generate gene-level count matrices, using the Bioconductor tximport package to collate counts and effective lengths into R, and then to load the tximport data into DESeq2 with DESeqDataSetFromTximport().

If you only have counts, you can generate something roughly like TPMs, but you need to know the lengths of the genes. It will be only a rough estimate, because you don't know which isoform for each gene was expressed (or if it was a combination and at what percent). So you'll need to calculate, for each gene (row of your count matrix), what is some number for the length of the gene. We don't have any general utilities in DESeq2 for this for an arbitrary count matrix.

You can create a TPM matrix by dividing each column of the counts matrix by some estimate of the gene length (again this is not ideal for the reasons stated above).

x <- counts.mat / gene.length

Then with this matrix x, you do the following:

tpm.mat <- t( t(x) * 1e6 / colSums(x) )

Such that the columns sum to 1 million.

ADD COMMENTlink written 14 months ago by Michael Love16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 216 users visited in the last hour