What lengths are used to calculate TPM in "scater" package
1
1
Entering edit mode
Matan G. ▴ 40
@matan-g-22483
Last seen 9 weeks ago

Hi,

I'm trying to calculate TPM for raw count bulk RNAseq data. Does anyone know how do lengths of transcripts are retrieved when using the function "calculateTPM"? Or if lengths should be provided, how do I calculate them? ref: https://www.rdocumentation.org/packages/scater/versions/1.0.4/topics/calculateTPM

Best

scater calculateTPM RNAseq • 587 views
1
Entering edit mode

In the help file ?calculateTPM you'll see an argument

lengths: Numeric vector providing the effective length for each
feature in ‘x’. Alternatively ‘NULL’, see Details.


As for how to retrieve lengths? Use information from a gene annotation source - ensemble/biomart/TxDb etc.

0
Entering edit mode

Ideally, you would have a a transcript length per gene, per sample. Something like what RSEM outputs. But, if it's single-cell data, it won't be possible for most kinds of sequencing protocols that sequence only the 5' end or the 3' end.

0
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 3 hours ago
The city by the bay

What Alan and Dario said. You will have to get your own lengths, there's no way for scater to know what annotation (or what version of it) you're using. There is a worked example here of using AnnotationHub resources to get the exonic lengths via AH73905, which - IIRC - is Ensembl GRCm38 version 97.