Question: Batch Corrections (MNN) with TPM values
0
gravatar for hamza_karakurt
5 weeks ago by
hamza_karakurt30 wrote:

Hello, I have a simple question. I have 2 public data sets and I want to use both of them with MNN correction for certain analyses. One of the data sets has raw counts, which is suitable to use in Scater/Scran but the other one only has TPM values in the supplementary. To use MNN we need normalized counts as I know. If I convert raw counts of the first data set with calculateTPM() function and use TPM values of both of the data sets for MNN, do you think it will work?

Thank you in advance.

ADD COMMENTlink modified 5 weeks ago by Aaron Lun25k • written 5 weeks ago by hamza_karakurt30
Answer: Batch Corrections (MNN) with TPM values
2
gravatar for Aaron Lun
5 weeks ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

For starters, you'll need to log-transform the values. The problem lies in the fact that the usual pseudo-count of 1 makes no sense for TPMs. (One could argue that it doesn't make sense in general, but it is especially nonsensical in this case, where you're adding a "count" to a non-count TPM.) I discuss this in more detail here.

Without the counts, the next-best solution is to guess the average per-cell sequencing depth for the second experiment. For example, if you assume that each cell was sequenced to a depth of 5000, then you could recover some normalized count-like values by multiplying your TPMs with 5000/1e6. Then you can just log-transform it and feed it through the same scater + scran pipeline.

If you do it the other way (where you compute TPMs from the first dataset), you get to the same problem of choosing an appropriate offset for the log-transformation. This isn't entirely academic, because if your values are small compared to the pseudo-count, the log-transformation is basically a linear transformation. In scRNA-seq contexts, it tends to be the case that the TPMs are artificially large compared to the pseudo-count, which gives a lot of weight to the jump from zero to non-zero values (and thus increases the effect of noise due to dropouts, etc.).

ADD COMMENTlink written 5 weeks ago by Aaron Lun25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 440 users visited in the last hour