I am trying to process a scRNA-seq data set generated with SmartSeq2 using `scater` v1.6.1. I quantified gene expression using Salmon and then loaded it into scater with `readSalmonResults` as described in the vignette. I then summarised the transcript-level expression into genes using `sce_gene = summariseExprsAcrossFeatures(sce_transcripts, exprs_values="tpm", summarise_by="feature_id")`.
However, I am now very confused about what the meaning of `counts(sce_gene)` is and how I correctly deal with functions that work with counts by default. I provided `exprs_values="tpm"` to summariseExprsAcrossFeatures() because summing up the read counts from individual transcripts does not make sense to me (due to different lengths) and this seems to be confirmed by the documentation. However, if this is the case I don't understand why `counts(sce_gene)` is set at all, and what its meaning is.
For example, it seems like I should now run `normalise(sce_gene)` to calculate logcounts, which are used by functions like plotExpression(). However, this seems to operate on the counts() slot by default. Isn't this just going to give me nonsensical values? Should I be using `normalise(sce_gene, exprs_values='tpm')` instead? But then I wouldn't end up with logcounts but with logtpms, wouldn't I? Or is logcounts actually a misnomer and it actually represents something more like logtpms (scaled by gene length and library size)?
Similarly, scran::cyclone() seems to operate on the counts by default. Should I be specifying TPMs instead there? Can I still use the pre-computed pairs from the package?
And finally, scran::trendVar() crashes when I run it on sce_gene because it apparently expects the SCE to have sizeFactors(). However, sizeFactors() would only be set if I run computeSumFactors(), which would only make sense if I had some sort of proper counts matrix. How do I deal with that? Can I just set `sizeFactors(sce_gene) = rep(1, ncol(sce_gene))` to signal that sizeFactors were not calculated by scran?
edit: After some detective work it looks like scater might actually be re-calculating the counts from the TPMs inside summariseExprsAcrossFeatures, as `tpm * lib_size * 1e-06`. If I'm reading this right this should result in some sort of gene-length (but not library-size) adjusted count. Is that correct? Does this mean I should just use the counts afterall, or are TPM still better?