All right, I'll bite.
does the trajectory predicted by different algorithms offer any directionality between the cells/clusters? for example, which cluster is more immature in terms differentiation?
In and of itself, no. The trajectory is just a way of stringing together cellular states. You can impose directionality with prior biological knowledge about differentiation markers, the entropy/potency relationship or RNA velocity. I talk about this briefly in the relevant chapter of the book.
I know that RNA velocity may address the directionality question, but if this is the case, what added value does trajectory analysis has over RNA velocity?
To me, the biggest and most practical one is that trajectory analysis doesn't require unspliced counts. It's hard to emphasize how relevant this is; of the ~50 public datasets in the scRNAseq package, only one has unspliced counts and that's because we specifically put it in to test RNA velocity methods. If you're dealing with a public scRNA-seq dataset and it doesn't provide unspliced counts, that's just too bad - unless you're masochistic enough to pull down the FASTQs and start from there. Even for datasets where I already have FASTQs, I'm not entirely enthusiastic about regenerating the counts because (i) I already did my analysis with the existing counts and I don't want to repeat it and (ii) the directionality is biologically obvious (or at least clear enough to define the relevant experimental hypothesis that needs to be tested in the lab).
Another reason is that trajectory analysis works for any continuum unrelated to temporal progression. For example, if a cell is moderately active in a particular pathway, that doesn't necessarily mean it's becoming more or less active; it could just be maintaining its current activity. RNA velocity would not be helpful here because the velocity would be zero for all cells, but you can still build a trajectory of pathway activity to describe this phenomenon. Then you can test for DE genes with respect to activity and so on.
Even for temporal processes like differentiation, it's likely that you'll want to build a trajectory (in addition to the velocity vectors) to explicitly nail down the branch events, relationships between clusters, etc. To me, a trajectory is a continuous generalization of clusters, serving the same purpose of summarizing the data in a convenient form. Then you can talk about (and compute on) "the cells on branch 4" or "the cells lying on the path between clusters X and Y"... unless you favor communicating your results by waving your hands at a t-SNE.
And of course, the directionality information has its own cost in that the results are sensitive to the choice of how (un)spliced counts are obtained. Different quantification methods will yield different velocity vectors, possibly pointing in opposing directions (see here) so YMMV. In addition, once you start counting reads in introns, you're exposing yourself to questions like: how should intron retention be handled? Are unspliced counts driven by novel transcripts (e.g., miRNAs inside genes) or unannotated exons? What about repeats inside introns that tend to accumulate "read stacks"? By comparison, trajectory construction cares not about these things.
Thanks, Aaron. That’s helpful. I guess I’m still not sure why I would do trajectory analysis if I can’t get information about directionality. By clustering the cells I already get a sense of how close they are to each other (transcriptionally) and I can look at that umap to see how close/connected the clusters are to each other. The trajectory analysis basically gives you a mathematical and visual way of confirming it?
Thanks again!
You can either join the dots on the UMAP manually or you can ask a computer to do it for you by making a trajectory. That's what it really comes down to. Hell, the same could be said for clustering. Why bother running a clustering algorithm when you can just circle blobs on a t-SNE?
(Of course, I'm being facetious with my suggestion there. I would never use a low-dimensional visualization like t-SNE or UMAP as the basis for any analysis, there's an uncomfortable amount of magic happening under the hood. I would only use it to visualize findings from more quantitative analyses in higher dimensional spaces. Or more bluntly: with enough tuning of the parameters and suckers for reviewers, you can probably "prove" anything on a t-SNE/UMAP.)
Just think about what you would do if you found an interesting "path" through your dataset. You want to find genes that are DE along this path. How would you do it?
Thanks, Aaron. That’s helpful. I guess I’m still not sure why I would do trajectory analysis if I can’t get information about directionality. By clustering the cells I already get a sense of how close they are to each other (transcriptionally) and I can look at that umap to see how close/connected the clusters are to each other. The trajectory analysis basically gives you a mathematical and visual way of confirming it?
Thanks again!