Continuous variable that differs between repetitions and treatments DE Analysis in edgeR
1
0
Entering edit mode
@3746f5fd
Last seen 2 days ago
United States

Hello,

I have RNA-seq experimental data with the following variables (and the number of each in parentheses):

1. Treatment (3)
2. Distance from base (9 per repetition)
3. Repetition (4 per treatment)

To be clear, for each of the three treatments, there were 4 repetitions (individual trees) each, and from each of those repetitions, leaf tissue was collected from the tree at varying distances from the base of the tree, moving toward the top. We are trying to identify gene expression differences that vary with this distance from the base, which is defined as the number of nodes/branches from the base from which the collection was made.

Previously, we simply blocked each of the groups of collections into somewhat arbitrary "upper", "middle", and "lower" groups, so that we could contrast TreatmentA_upper to TreatmentB_upper for example, and did DE analysis that way. The problem here is that these designations are pretty imprecise and differentiating between "upper" and "middle" is really very arbitrary since these weren't very old or large trees.

We would like to be able to use the number of nodes from base and treat it as a continuous variable by which to do DE analysis in edgeR. I have read the documentation and a lot of forum posts about this (splines, time course data examples, etc.), but I haven't encountered any situation where the continuous variable is different between repetitions, which is what we have. TreatmentA_rep1 may have had collections made at nodes 9, 15, 17, 22, and 25, while TreatmentA_rep2 may have had them made at nodes 7, 18, 20, 25, and 29, for example.

Can anyone please point me in the right direction here? Or maybe it is best in this case to avoid treating these as continuous?

Thanks very much.

Erik

edgeR • 85 views
1
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia

I don't see any special difficulty with your experimental setup. Continuous variables are almost always different between repetitions and that doesn't cause any problem. If you expect Distance to have a linear effect then you could just use ~Treatment + Distance. If Distance has a nonlinear effect, then you could add polynomial or spline functions of Distance.

We don't have a section on continuous variables in the edgeR User's Guide. If we did, it would just say that continuous variables can be added to the model formula as in multiple regression.