5 weeks ago by
Cambridge, United Kingdom
For starters, you'll need to log-transform the values. The problem lies in the fact that the usual pseudo-count of 1 makes no sense for TPMs. (One could argue that it doesn't make sense in general, but it is especially nonsensical in this case, where you're adding a "count" to a non-count TPM.) I discuss this in more detail here.
Without the counts, the next-best solution is to guess the average per-cell sequencing depth for the second experiment. For example, if you assume that each cell was sequenced to a depth of 5000, then you could recover some normalized count-like values by multiplying your TPMs with
5000/1e6. Then you can just log-transform it and feed it through the same scater + scran pipeline.
If you do it the other way (where you compute TPMs from the first dataset), you get to the same problem of choosing an appropriate offset for the log-transformation. This isn't entirely academic, because if your values are small compared to the pseudo-count, the log-transformation is basically a linear transformation. In scRNA-seq contexts, it tends to be the case that the TPMs are artificially large compared to the pseudo-count, which gives a lot of weight to the jump from zero to non-zero values (and thus increases the effect of noise due to dropouts, etc.).