Hello,
I have performed an analysis of my data, the first portion at least, based on input from this forum, but I wanted to make sure I am doing this correctly before moving on. As I've mentioned in another post, I have a dataset set up as follows, it is leaf sample collections from trees subjected to three different treatments, 4 reps per treatment, from various locations on the trees, starting with the top and moving down:
            id node_id treatment node_loc main_lateral
1    15_A_main       1         A     main         main
2  15_A_middle      17         A   middle      lateral
3  15_A_middle      14         A   middle      lateral
4   15_A_upper       9         A    upper      lateral
5   15_A_lower      26         A    lower      lateral
6   15_A_lower      20         A    lower      lateral
7    16_C_main       1         C     main         main
8  16_C_middle      14         C   middle      lateral
9   16_C_upper       7         C    upper      lateral
10  16_C_lower      25         C    lower      lateral
11  16_C_lower      22         C    lower      lateral
12 16_C_middle      17         C   middle      lateral
There are actually 3 different treatments - A,B,C. We expect that there will be a significant amount of difference between the three treatments in gene expression as sampling moves down the tree (the node_id column starts at 1 at the top and increases as sampling moves down the tree). However, we are interested in simply finding genes that vary with node_id regardless of the treatment at first.
So I have done the following for the design:
X <- ns(y$samples$node_id, df = 3)
design <- model.matrix(~X)
For the actual DE testing, I did:
y <- estimateDisp(y, design, robust=TRUE)
fit.spline <- glmQLFit(y, design, robust=TRUE)
fit.spline <- glmQLFTest(fit.spline, coef = 2:4)
Is this all I need to do to answer this particular question ("which genes are differentially expressed with node_id regardless of treatment")?
Thanks very much!
Erik

OK, thank you!!! Just to be clear, when I do that the output of
colnames(design)becomesTherefore, to test for DE, I would simply use
Correct?
Thanks again for the help!!!
Erik
No. As Gordon already explained, testing the node coefficients simply tells you if the expression values change depending on where you collected the leaves. You already assume that to be true! The interesting thing is to know if the treatment affects expression, after controlling for the fact that you have collected the leaves at different places in the different plants. The treatmentB coefficient tests for differences between treatmentB and treatmentA, and treatmentC does the same (e.g., treatmentC vs treatmentA). And if you want to know differences between treatmentB and treatmentC, you can use a contrast comparing the two.
Right, that is what we are interested in for the initial analysis, the genes that change with collection location without regard to which treatment was received. I think the reason I have been asked to do this is so that heatmaps and dendrograms can be generated for this group of genes downstream, comparing their expression in the different treatments and presenting it visually. That's my guess, anyway, since you are definitely correct that it wouldn't be of the most experimental interest.
In that case, one more complication. There is the possibility that the changes across collection locations are dependent on the treatment. The model you are fitting ignores that possibility. If you do
You will add coefficients that estimate the difference in expression at collection site, depending on the treatment. Or put another way, genes that have different curves depending on the treatment (see the example in the limma User's Guide, section 9.6.2). Those genes do change with collection location, but not without regard to the treatment. The genes that change with collection location without regard to the treatment will be those with a significant F-test for coef = 4:6, that are not significant for coef = 7:12.
That should be
for the coefficients to be in the order that using coef = 4:6 and coef = 7:12 will work as I describe.
That makes sense, thanks so much for the clear and detailed explanations!!!! I will look again at the limma guide to try to better my understanding also.