I have an experiment where each sample is measured multiple times and samples are nested within the experimental group. I have noticed that my choice of sample reference level seems to influence the results. Below is a minimal example to show what I mean:
library(edgeR)
counts <- matrix(c(6,2,6,21,19,15,17,18,20,12), nrow = 1)
counts <- rbind(counts, 25 - counts)
setup <- data.frame(sample = factor(c("1","1","1","2","2","1","1","2","2","2")),
                    group = factor(c(rep("a", 5), rep("b", 5))))
tester <- function(){
  design <- model.matrix(~ group + group:sample, setup)
  y <- DGEList(counts = counts, group = setup$group, genes = c("test", "test2"))
  y <- estimateDisp(y, design=design, tagwise = FALSE, trend.method = "none")
  fit <- glmFit(y, design)
  topTags(glmLRT(fit, coef = 2))
}
tester()
# Coefficient:  groupb 
#   genes     logFC   logCPM       LR       PValue          FDR
# 1  test  1.750700 19.03706 16.39638 5.138319e-05 0.0001027664
# 2 test2 -1.164792 18.81775 10.49939 1.194136e-03 0.0011941364
# Set reference group to "2" instead of "1"
setup$sample <- relevel(setup$sample, "2")
tester()
# Coefficient:  groupb 
#   genes      logFC   logCPM        LR    PValue       FDR
# 2 test2  0.7228214 18.81775 1.9869241 0.1586629 0.3173257
# 1  test -0.2612434 19.03706 0.7314366 0.3924179 0.3924179
I have a couple of questions: Is this intended behaviour, and if so, why?
Is my model specification incorrect or can I change it to ensure consistent results?
sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.20.9 limma_3.34.9 loaded via a namespace (and not attached): [1] compiler_3.4.3 tools_3.4.3 Rcpp_0.12.16 grid_3.4.3 locfit_1.5-9.1 lattice_0.20-35

Thank you for the comprehensive explanation, that cleared things up quite a bit. What I am really interested in would be the log-fold change in the average of B1 and B2 over the average of A1 and A2. Can I design a contrast with my current model specification that would test this? Something like
((Intercept) + groupa:sample2)/2 - (groupb + groupb:sample2)/2. (this formulation seems wrong).Alternatively I could fit a model of the form
~ 0 + group:sample, and then make a contrast of(b1 + b2)/2 - (a1 + a2)/2, that should give me the difference I am after, correct?Looking at the first design matrix:
So to create the null hypothesis equivalent to
(A1 + A2)/2 = (B1 + B2)/2, you do:... which can be simplified to:
... the left-hand-side of which is the expression to use in
makeContrasts().If you think that looks unnecessarily complicated, you'd be right; a far simpler approach would be to combine
groupandsampleinto a single factor, and use a no-intercept model as you've propose in your second option.That is excellent, I find the construction of contrasts in R to be confusing and not all packages has the same flexibility as edgeR does so this will definitely help me in the future.