TPM from STAR mapped counts
3
0
Entering edit mode
@gingergeiger22-18200
Last seen 15 months ago
United States

Hi all,

I am trying to find TPM gene expression using RNA-Seq data mapped by STAR with --quantMode GeneCounts to the genome, so I am starting with a counts table with samples in columns and gene counts in rows. I do not want to use this dataset for DEG analysis, rather I want to find expression levels by TPM. My code is actually working and I'm wondering if it is fine to start with reads mapped to genes instead of transcripts? Also, I'm looking for feedback on my data processing (especially exon lengths method) since I am unable to find a straightforward tutorial for calculating gene expression from RNA-Seq data as most go to DEG analysis. Thanks for your input! PS if there is any issue, if you can advise resources to help I would really appreciate it!


#read in the annotation
anno1 <- read.delim("/Users/Documents/Analysis/ensembl-ChlSab1-1.tsv",as.is=T)
#read in the counts dataset from STAR output
Verocounts <- read.delim("~/Documents/Analysis/VeroAnalysis1-AnnoChrOnly/Vero_rnaseqcounts_col4.txt", row.names = 1)
#load the gtf file
gtf <- "Chlorocebus_sabaeus.ChlSab1.1.109.chr.gtf"
txdb.filename <- "Chlorocebus_sabaeus.ChlSab1.1.109.chr.gtf.sqlite"

#Make the dge object
Verocounts <- read.delim("~/Documents/Analysis/Vero_rnaseqcounts_col4.txt", row.names = 1)
head(Verocounts)
d1 <- DGEList(Verocounts)
head(d1)
dim(d1) #27556 x 6

#get the exon lengths from the annotation to input as vector for TPM calculation
txdb <- makeTxDbFromGFF(gtf, format="gtf")
exonic <- exonsBy(txdb, by="gene")
red.exonic <- reduce(exonic)
exon.lengths <- vapply(width(red.exonic), sum, numeric(1))

# TPM
VeroTPM <- convertCounts(d1$counts,
                            unit       = "tpm",
                            geneLength = exon.lengths,
                            log        = FALSE,
                            prior.count=0)

VeroTPM <- data.frame(VeroTPM) 

#Add in the gene names from annotation
# Find matching rows based on row names
matching_rows <- match(rownames(VeroTPM), anno1$Gene.stable.ID)
# Subset the relevant columns from `anno1`
additional_info <- anno1[matching_rows, c("Gene.stable.ID", "Gene.name")]
# Append the additional column to `Log2FPKM`
VeroTPM$Gene.name <- additional_info$Gene.name

#output is TPM counts output with samples in columns and counts in rows

sessionInfo( )
GeneExpression GenomicFeatures RNA-Seq TPM • 3.4k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 43 minutes ago
WEHI, Melbourne, Australia

I notice that you are using convertCounts function which is part of the DGEobj.utils package on CRAN. That function is actually a wrapper for edgeR functions. convertCounts uses edgeR::rpkm to compute RPKM (or FPKM) values and then optionally rescales them to approximate TPM values using the approach suggested by Harold Pimentel (https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/).

Strictly speaking, it is impossible to compute TPMs from read counts. TPM is defined in terms of transcripts so it can only be obtained from tools designed to quantify expression at the transcript/isoform level such as RSEM or Salmon. These tools are not Bioconductor packages so, as James has indicated, your enquiry is best sent to another forum like Biostars.

If you are working with read counts, then the usual surrogate measure for gene expression is RPKM, which will correlate well with TPM when gene expression is dominated by a one main transcript, as is often the case. Harold Pimentel has suggested that RPKM can be converted to approximate TPM by scaling.

ADD COMMENT
1
Entering edit mode
ATpoint ★ 4.6k
@atpoint-13662
Last seen 14 hours ago
Germany

Please google these questions first. It has been asked many times before. If you have a gene length information (different ways to get that, see previous posts) you can get TPM-like metrics as in DESeq2: Is it possible to convert read counts to expression values via TPM and return these values?

ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 7 hours ago
United States

This question is off-topic for this support site. You could try biostars.org.

ADD COMMENT

Login before adding your answer.

Traffic: 672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6