I have RNA-seq experimental data with the following variables (and the number of each in parentheses):
- Treatment (3)
- Distance from base (9 per repetition)
- Repetition (4 per treatment)
To be clear, for each of the three treatments, there were 4 repetitions (individual trees) each, and from each of those repetitions, leaf tissue was collected from the tree at varying distances from the base of the tree, moving toward the top. We are trying to identify gene expression differences that vary with this distance from the base, which is defined as the number of nodes/branches from the base from which the collection was made.
Previously, we simply blocked each of the groups of collections into somewhat arbitrary "upper", "middle", and "lower" groups, so that we could contrast TreatmentA_upper to TreatmentB_upper for example, and did DE analysis that way. The problem here is that these designations are pretty imprecise and differentiating between "upper" and "middle" is really very arbitrary since these weren't very old or large trees.
We would like to be able to use the number of nodes from base and treat it as a continuous variable by which to do DE analysis in edgeR. I have read the documentation and a lot of forum posts about this (splines, time course data examples, etc.), but I haven't encountered any situation where the continuous variable is different between repetitions, which is what we have. TreatmentA_rep1 may have had collections made at nodes 9, 15, 17, 22, and 25, while TreatmentA_rep2 may have had them made at nodes 7, 18, 20, 25, and 29, for example.
Can anyone please point me in the right direction here? Or maybe it is best in this case to avoid treating these as continuous?
Thanks very much.