Question

TPM from STAR mapped counts

0

Entering edit mode

gingergeiger22 • 0

@gingergeiger22-18200

Last seen 11 weeks ago

United States

Hi all,

I am trying to find TPM gene expression using RNA-Seq data mapped by STAR with --quantMode GeneCounts to the genome, so I am starting with a counts table with samples in columns and gene counts in rows. I do not want to use this dataset for DEG analysis, rather I want to find expression levels by TPM. My code is actually working and I'm wondering if it is fine to start with reads mapped to genes instead of transcripts? Also, I'm looking for feedback on my data processing (especially exon lengths method) since I am unable to find a straightforward tutorial for calculating gene expression from RNA-Seq data as most go to DEG analysis. Thanks for your input! PS if there is any issue, if you can advise resources to help I would really appreciate it!


#read in the annotation
anno1 <- read.delim("/Users/Documents/Analysis/ensembl-ChlSab1-1.tsv",as.is=T)
#read in the counts dataset from STAR output
Verocounts <- read.delim("~/Documents/Analysis/VeroAnalysis1-AnnoChrOnly/Vero_rnaseqcounts_col4.txt", row.names = 1)
#load the gtf file
gtf <- "Chlorocebus_sabaeus.ChlSab1.1.109.chr.gtf"
txdb.filename <- "Chlorocebus_sabaeus.ChlSab1.1.109.chr.gtf.sqlite"

#Make the dge object
Verocounts <- read.delim("~/Documents/Analysis/Vero_rnaseqcounts_col4.txt", row.names = 1)
head(Verocounts)
d1 <- DGEList(Verocounts)
head(d1)
dim(d1) #27556 x 6

#get the exon lengths from the annotation to input as vector for TPM calculation
txdb <- makeTxDbFromGFF(gtf, format="gtf")
exonic <- exonsBy(txdb, by="gene")
red.exonic <- reduce(exonic)
exon.lengths <- vapply(width(red.exonic), sum, numeric(1))

# TPM
VeroTPM <- convertCounts(d1$counts,
                            unit       = "tpm",
                            geneLength = exon.lengths,
                            log        = FALSE,
                            prior.count=0)

VeroTPM <- data.frame(VeroTPM) 

#Add in the gene names from annotation
# Find matching rows based on row names
matching_rows <- match(rownames(VeroTPM), anno1$Gene.stable.ID)
# Subset the relevant columns from `anno1`
additional_info <- anno1[matching_rows, c("Gene.stable.ID", "Gene.name")]
# Append the additional column to `Log2FPKM`
VeroTPM$Gene.name <- additional_info$Gene.name

#output is TPM counts output with samples in columns and counts in rows

sessionInfo( )

GeneExpression GenomicFeatures RNA-Seq TPM • 4.2k views

ADD COMMENT • link updated 19 months ago by Gordon Smyth 52k • written 19 months ago by gingergeiger22 • 0

score 1 · Answer 1 · 2023-09-14

I notice that you are using convertCounts function which is part of the DGEobj.utils package on CRAN. That function is actually a wrapper for edgeR functions. convertCounts uses edgeR::rpkm to compute RPKM (or FPKM) values and then optionally rescales them to approximate TPM values using the approach suggested by Harold Pimentel (https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/).

Strictly speaking, it is impossible to compute TPMs from read counts. TPM is defined in terms of transcripts so it can only be obtained from tools designed to quantify expression at the transcript/isoform level such as RSEM or Salmon. These tools are not Bioconductor packages so, as James has indicated, your enquiry is best sent to another forum like Biostars.

If you are working with read counts, then the usual surrogate measure for gene expression is RPKM, which will correlate well with TPM when gene expression is dominated by a one main transcript, as is often the case. Harold Pimentel has suggested that RPKM can be converted to approximate TPM by scaling.

score 1 · Answer 2 · 2023-09-15

1

Entering edit mode

ATpoint ★ 4.8k

@atpoint-13662

Last seen 1 hour ago

Germany

Please google these questions first. It has been asked many times before. If you have a gene length information (different ways to get that, see previous posts) you can get TPM-like metrics as in DESeq2: Is it possible to convert read counts to expression values via TPM and return these values?

ADD COMMENT • link 19 months ago ATpoint ★ 4.8k

score 0 · Answer 3 · 2023-09-14

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

This question is off-topic for this support site. You could try biostars.org.

ADD COMMENT • link 19 months ago James W. MacDonald 68k