Dear all, I am so confused, I would really appreciate help.
I have a table of read counts from RNASeq data (i.e. just a table, where each column is a sample, and each row is a gene, and the cells are read counts that range from 0 to say 10,000). I want to convert these to TPM values and output a matrix/table of TPM gene expression values (each row still a gene name, each column still a sample name).
I was advised that DESeq2 could do this. I was reading the manual: https://bioc.ism.ac.jp/packages/2.14/bioc/vignettes/DESeq2/inst/doc/beginner.pdf. I thought section 2.4.3 "Starting from count tables" would have all the information I needed. However, the example seems to jump straight from reading in count data, to producing differentially expressed genes. I've found this to be the case with a few places I've read, and I can't seem to find a simple answer anywhere.
Is it possible to input just a table of read counts, and output a table of TMP gene expression values.
The code would be like:
library("DESeq2") count_data <-read.table("MatrixOfReadCount",header=T)
And then I was trying to do something like:
countData <-assay(read_table) (but I was getting errors: Unable to find an inherited method for function ‘assay’ for signature ‘"data.frame", "missing"’)
or:
ddsFullCountTable <-DESeqDataSetFromMatrix(countData=read_table)
(I also have other examples of what I tried, but there were all pointless).
I'm just so confused. I've also read other places and forums, and I cannot find how to simply "read in a matrix of read counts (row names = genes, column names = sample), and output a matrix of TPM gene expression values (row names = genes, column names = sample), without doing any differential expression analysis".
If someone could help in any way, I would appreciate it. Can this be done?
Hi Michael,
To get more likely TPM values from counts, could I use Combat-seq to remove batch effect from tximport counts then use batch corrected tximport counts and tximport length(effective gene length) to calculate TPM values? Will that TPM values still far away from a real tximport abundance data?
I know tximport abundance is not calculated from tximport counts. However, combat-seq won't take tximport abundance data. I have read your previous comment - Convert Normalized RNA-Seq counts to TPM after Batch Correction. You have said "if you want batch corrected TPM, I think I would just stay on the abundance scale the whole time (estimating mean shifts and removing them on the log2 scale, and working with log2 TPM for plotting or whatever else you need)." Does it mean working on tximport abundance data directly and remove its batch effects by removing estimating mean shifts on log2 scale? Could you elaborate on it since this method seems quite different from Combat-seq's batch correction method.
Thanks a lot!
Hongwei
This seems pretty reasonable (your first parapraph). I would describe those as batch-effect corrected TPM. If you are using the per-sample effective gene lengths, and then scaling so each column sums to 1e6, that sounds like a good approach.
Thanks a lot for your explanation and help!
Dear Michael Love and all, My objective is to compare gene expression levels within the same group (biological condition) of samples and for that I want to use TPM values. I have used tximport with
Will "abundance" will be good for this purpose? Many Thanks
Yes, you this gene-level abundance (TPM) can be compared across genes or samples.