Question

Trajectory analysis for single cell sequencing

0

Entering edit mode

lirongrossmann ▴ 50

@lirongrossmann-23954

Last seen 4.8 years ago

Hi everyone, I have a fundamental question regarding trajectory analysis in single cell analysis: does the trajectory predicted by different algorithms offer any directionality between the cells/clusters? for example, which cluster is more immature in terms differentiation? If so, which package would you recommend to obtain that info?

I know that RNA velocity may address the directionality question, but if this is the case, what added value does trajectory analysis has over RNA velocity?

My understanding of the basic principle of trajectory analysis (with differences between specific algorithms) is that it finds a minimum spanning tree between cells/clusters, which may help understand conncectivity, but does not give you knowledge of the direction.

Any clarifications/ comments are welcome!

single cell trajectory analysis tscan monocle • 3.9k views

ADD COMMENT • link updated 5.4 years ago by Aaron Lun ★ 29k • written 5.4 years ago by lirongrossmann ▴ 50

score 2 · Answer 1 · 2020-08-12

All right, I'll bite.

does the trajectory predicted by different algorithms offer any directionality between the cells/clusters? for example, which cluster is more immature in terms differentiation?

In and of itself, no. The trajectory is just a way of stringing together cellular states. You can impose directionality with prior biological knowledge about differentiation markers, the entropy/potency relationship or RNA velocity. I talk about this briefly in the relevant chapter of the book.

I know that RNA velocity may address the directionality question, but if this is the case, what added value does trajectory analysis has over RNA velocity?

To me, the biggest and most practical one is that trajectory analysis doesn't require unspliced counts. It's hard to emphasize how relevant this is; of the ~50 public datasets in the scRNAseq package, only one has unspliced counts and that's because we specifically put it in to test RNA velocity methods. If you're dealing with a public scRNA-seq dataset and it doesn't provide unspliced counts, that's just too bad - unless you're masochistic enough to pull down the FASTQs and start from there. Even for datasets where I already have FASTQs, I'm not entirely enthusiastic about regenerating the counts because (i) I already did my analysis with the existing counts and I don't want to repeat it and (ii) the directionality is biologically obvious (or at least clear enough to define the relevant experimental hypothesis that needs to be tested in the lab).

Another reason is that trajectory analysis works for any continuum unrelated to temporal progression. For example, if a cell is moderately active in a particular pathway, that doesn't necessarily mean it's becoming more or less active; it could just be maintaining its current activity. RNA velocity would not be helpful here because the velocity would be zero for all cells, but you can still build a trajectory of pathway activity to describe this phenomenon. Then you can test for DE genes with respect to activity and so on.

Even for temporal processes like differentiation, it's likely that you'll want to build a trajectory (in addition to the velocity vectors) to explicitly nail down the branch events, relationships between clusters, etc. To me, a trajectory is a continuous generalization of clusters, serving the same purpose of summarizing the data in a convenient form. Then you can talk about (and compute on) "the cells on branch 4" or "the cells lying on the path between clusters X and Y"... unless you favor communicating your results by waving your hands at a t-SNE.

And of course, the directionality information has its own cost in that the results are sensitive to the choice of how (un)spliced counts are obtained. Different quantification methods will yield different velocity vectors, possibly pointing in opposing directions (see here) so YMMV. In addition, once you start counting reads in introns, you're exposing yourself to questions like: how should intron retention be handled? Are unspliced counts driven by novel transcripts (e.g., miRNAs inside genes) or unannotated exons? What about repeats inside introns that tend to accumulate "read stacks"? By comparison, trajectory construction cares not about these things.