What lengths are used to calculate TPM in "scater" package
1
2
Entering edit mode
Matan G. ▴ 50
@matan-g-22483
Last seen 2.6 years ago

Hi,

I'm trying to calculate TPM for raw count bulk RNAseq data. Does anyone know how do lengths of transcripts are retrieved when using the function "calculateTPM"? Or if lengths should be provided, how do I calculate them? ref: https://www.rdocumentation.org/packages/scater/versions/1.0.4/topics/calculateTPM

Best

scater calculateTPM RNAseq • 2.3k views
ADD COMMENT
1
Entering edit mode

In the help file ?calculateTPM you'll see an argument

lengths: Numeric vector providing the effective length for each
          feature in ‘x’. Alternatively ‘NULL’, see Details.

As for how to retrieve lengths? Use information from a gene annotation source - ensemble/biomart/TxDb etc.

ADD REPLY
0
Entering edit mode

Ideally, you would have a a transcript length per gene, per sample. Something like what RSEM outputs. But, if it's single-cell data, it won't be possible for most kinds of sequencing protocols that sequence only the 5' end or the 3' end.

ADD REPLY
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 22 hours ago
The city by the bay

What Alan and Dario said. You will have to get your own lengths, there's no way for scater to know what annotation (or what version of it) you're using. There is a worked example here of using AnnotationHub resources to get the exonic lengths via AH73905, which - IIRC - is Ensembl GRCm38 version 97.

ADD COMMENT

Login before adding your answer.

Traffic: 656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6