Hello,
My current dataset has four treatments in a typical set-up with a control, two single exposures and one double exposure. However, unlike e. g. testing potentially toxic substances, I just have two different concentrations instead of presence/absence. My four groups would then be HighA/HighB, LowA/HighB, HighA/LowB, LowA/LowB. In this context, it is the high concentrations that are considered to be the baseline/reference/control and the low concentrations the experimental challenge.
My matrix headers are:
HiAHiB-1 LoAHiB-1 HiALoB-1 LoALoB-1 HiAHiB-2 LoAHiB-2 HiALoB-2 LoALoB-2 ... ...
My factors are:
A <- factor(substring(colnames(data.set),1,3))
B <- factor(substring(colnames(data.set),4,6))
batch <- factor(substring(colnames(data.set),8,8))
Looking more closely at e. g. A:
> A
[1] HiA LoA HiA LoA HiA LoA HiA LoA ... ...
Levels: HiA LoA
When I specify my design matrix like this:
design <- model.matrix(~batch+A*B)
My headers include "ALoA" and "BLoB"
From reading the EdgeR User Guide (particularly the Arabidopsis case study, pp. 61-68), I suspect that this would indicate that HiA is set as the reference (since it is the level farthest to the left after "Levels:") and if a comparison between HiA and LoA produce a log2 fold change value that was positive, this would indicate an upregulated in LoA compared with HiA. Is this correct? Or does relevel() always have to be done? In the case study, mock is to the left in the first table in section 4.6.4, but then also set with relevel().
Does the design matrix header include the non-reference condition and if so, what does that mean? In the Arabidopsis case study, it was "Treathrcc" with hrcc as the non-reference, and for me it was e. g. "ALoA" with LoA as the non-reference.
I ask primarily because I have not yet quite figured out how to abstract from case studies to a more general approach when it comes to mathematical modelling.