Dear community,
I was re-reading LEMUR manuscript and wondered about modelling gene program dynamics. In figure 5 they took a dataset of zebrafish embryo development with multiple sample across timepoints and infer gene dynamics by fitting a natural cubic splines with three degrees of freedom in order to represent the time dependence of each gene. Could this be done at the level of 'gene programs' (aka gene signatures)?
Moving what was wrote in this GitHub issue to this community, Constantin suggests three options:
- The less elegant one: to modify the input matrix from genes x cell to programs x cell
- The "easiest" one: to do a post-hoc analysis after fitting the LEMUR model on the full data, i.e. the regular model and input. Either by (i) averaging the predicted expression values of multiple genes or by (ii) doing a more sophisticated approach like in TcGSA.
- The more elegant but complicated one: to include the information that a set of genes should move consistently into the LEMUR inference step. In other words, to find a good way to express this information sharing across genes into the Grassmann Regression step.
As of today, I don't have any concrete application or dataset that would benefit from this, but I find it worthy to discuss given the modelling power {lemur} has.
Best,
Pedro