I'm curious about the best way to work with a "spotty" continuous variable, ie a variable that is technically continuous, but there are some gaps in the values such that you might worry a model might not fit too well.
For example, say I have fifteen samples in three groups, with a continuous variable that has gaps as below.
DF <- data.frame(Group = c(rep("A",5), rep("B",5), rep("C",5)), Variable <- c(196, 272, 284, 395, 407, 631, 683, 715, 784, 928, 1176, 1177, 1193, 1234, 1240)) ggplot(DF, aes(x=Variable, y=Variable, color=Group))+ geom_point(shape=2, stroke=1, size=3)
One part of the analysis will be comparing gene expression between the three groups. Since the continuous variable differs by group, a between-group comparison will tell me a bit about gene expression that co-varies with my variable. However, I wonder if I'd get more information by modelling gene expression based on the continuous variable (especially since my group size is small and the continuous variable is very biologically interesting), either by including it in the model in limma or by using splines.
My question is, will either of those options (covariate or splines) be overly negatively affected by the gaps in the continuous variable?
Thanks for your help!