Using TPM from tximport of Salmon output for singscore
1
0
Entering edit mode
Kent • 0
@5f50f52d
Last seen 22 hours ago
United Kingdom

Hi all,

I think it's stated in the F1000 research article that singscore only cares about ranking within samples so TPM/RPKM/FPKM is good enough and TMM normalisation is not essential. I am just wondering if I have interpreted it correctly. If so, can I just use TPM from salmon and calculated by tximport for singscore? Of course I will need to filter the low count genes. I am just wondering if that's the correct interpretation. So something like:

# Read files
txi <-  tximport(files, type = "salmon", tx2gene = tx2gene)

# Get TPM
tpm <- txi$abundance
tpm <- tpm[rowSums(tpm) > 2, ] # Filter genes

tpm_ranked = rankGenes(tpm)

And then continue with the workflow?

Many thanks!

singscore • 375 views
ADD COMMENT
1
Entering edit mode

As a comment, TMM does not apply here because the authors say in the article that they recommend gene length bias to be removed, and TMM does not do that. What you could do is to use tximport output, run it through the usual calcNormFactors from edgeR and then use it's rpkm function to get the values you need. Alternatively, DESeq2 has a fpkm function. That way you could be very consistent if you use any of these packages nor differential analysis downstream.

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

TPM is fine, I've also sometimes used TPM with some robust column scaling, e.g.:

sf <- DESeq2::estimateSizeFactorsForMatrix(txi$abundance)
t( t( txi$abundance ) / sf )

This would help in case there are some very highly expressed genes only in a subset of samples, throwing off the column sum.

ADD COMMENT
0
Entering edit mode

Thanks Michael. Just out of curiosity, would you also use the same technique for things like clustering analysis or when looking at the abnormal changes in expression of a single sample? I am working on functional precision medicine kind of project, which means sometimes I have to look at the characteristics of a single sample that response to a certain drug. I am using singscore for this exact purpose. I know using z-score to identify differentially expressed genes in a single sample is a practice that might be frowned upon, but would that be okay?

mods if I should open a new question for this please let me know.

ADD REPLY

Login before adding your answer.

Traffic: 804 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6