Urgent advice needed, please!
I'm analyzing pathway activities derived using AUCell
from single-cell RNA sequencing data. My specific research question is to idenfity which pathways vary significantly along disease progression. The disease progression scores are continuous values from 0 to 1 obtained per subject/patient.
I'm curious about how to come up with an appropriate design matrix, while including covariates like age and sex, and then how to use limma for testing associations between pathway activities and the continuous disease progression metric. There are about 84 patients altogether but the actual cell X pathway
data matrix contains over a 100k cells (representing pseudo-replicates at patient level) for each pathway.
The current model I have looks something like this.
design <- model.matrix(~ 0 + CPS + age + sex, data=data)
fit <- lmFit(activity_scores_matrix, design)
fit <- eBayes(fit)
result <- topTable(fit, adjust.method = "BH",
number = Inf, confint = TRUE) %>%
arrange(P.Value)
Is this the appropriate way to model continuous relationships in limma? I'm particularly uncertain about using eBayes() with a continuous predictor since limma was originally designed for categorical comparisons. Should I include patient ID's as random effects using
duplicateCorrelation
?Would alternative approaches be more suitable for detecting pathway activity changes along a continuous disease progression scale? I remember reading something about splines a while back, but I can't quite place my hands on it.
Any guidance on best practices for using limma with continuous variables would be greatly appreciated. I want to ensure I'm using the most appropriate statistical framework for this analysis.
Thank you for your help!
Thank you so much for the clarification!