Question

Using TPM from tximport of Salmon output for singscore

0

Entering edit mode

Kent • 0

@5f50f52d

Last seen 22 hours ago

United Kingdom

Hi all,

I think it's stated in the F1000 research article that singscore only cares about ranking within samples so TPM/RPKM/FPKM is good enough and TMM normalisation is not essential. I am just wondering if I have interpreted it correctly. If so, can I just use TPM from salmon and calculated by tximport for singscore? Of course I will need to filter the low count genes. I am just wondering if that's the correct interpretation. So something like:

# Read files
txi <-  tximport(files, type = "salmon", tx2gene = tx2gene)

# Get TPM
tpm <- txi$abundance
tpm <- tpm[rowSums(tpm) > 2, ] # Filter genes

tpm_ranked = rankGenes(tpm)

And then continue with the workflow?

Many thanks!

singscore • 375 views

ADD COMMENT • link 2 days ago • updated 1 day ago Kent • 0

1

Entering edit mode

As a comment, TMM does not apply here because the authors say in the article that they recommend gene length bias to be removed, and TMM does not do that. What you could do is to use tximport output, run it through the usual calcNormFactors from edgeR and then use it's rpkm function to get the values you need. Alternatively, DESeq2 has a fpkm function. That way you could be very consistent if you use any of these packages nor differential analysis downstream.

ADD REPLY • link 1 day ago ATpoint ★ 4.7k

score 1 · Answer 1 · 2025-03-19

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

TPM is fine, I've also sometimes used TPM with some robust column scaling, e.g.:

sf <- DESeq2::estimateSizeFactorsForMatrix(txi$abundance)
t( t( txi$abundance ) / sf )

This would help in case there are some very highly expressed genes only in a subset of samples, throwing off the column sum.

ADD COMMENT • link 1 day ago Michael Love 43k

0

Entering edit mode

Thanks Michael. Just out of curiosity, would you also use the same technique for things like clustering analysis or when looking at the abnormal changes in expression of a single sample? I am working on functional precision medicine kind of project, which means sometimes I have to look at the characteristics of a single sample that response to a certain drug. I am using singscore for this exact purpose. I know using z-score to identify differentially expressed genes in a single sample is a practice that might be frowned upon, but would that be okay?

mods if I should open a new question for this please let me know.

ADD REPLY • link 1 day ago Kent • 0