OK, I found the answer for myself, in Love, Huber, and Anders (2015):
Expanded design matrices
For consistency with our software’s documentation, in the following text we will use the terminology of the R statistical language. In linear modeling, a categorical variable or factor can take on two or more values or levels. In standard design matrices, one of the values is chosen as a reference value or base level and absorbed into the intercept. In standard GLMs, the choice of base level does not influence the values of contrasts (LFCs). This, however, is no longer the case in our approach using ridge-regression-like shrinkage on the coefficients (described below), when factors with more than two levels are present in the design matrix, because the base level will not undergo shrinkage while the other levels do.
To recover the desirable symmetry between all levels, DESeq2 uses expanded design matrices, which include an indicator variable for each level of each factor, in addition to an intercept column (i.e., none of the levels is absorbed into the intercept). While such a design matrix no longer has full rank, a unique solution exists because the zero-centered prior distribution (see below) provides regularization. For dispersion estimation and for estimating the width of the LFC prior, standard design matrices are used.
So, in essence, using modelMatrixType="standard" in this context generates incorrect results by computing a contrast between a non-shrunken term for the first factor level and a shrunken term for the third factor level. Perhaps the current warning generated by the code could be improved to indicate this more clearly.
Below is example code showing the difference in results between "expanded" and "standard" model matrix, using dummy data that mirrors the structure of my data. The dummy data shows a regression slope of 0.81 and correlation of 0.98 between the two sets of results, while the actual data I am using have a regression slope of 0.11 and a correlation of 0.56.
# Create dummy data
R version: 3.2.0
DESeq2 version: 1.8.1
Setting alpha=1 does not make sense (and I've just now added an error for future versions for giving an alpha=1). You want to provide a number which will be the target FDR for the filtering. alpha=0.1 is default, which corresponds to a target FDR of 10%. If you do not want to perform filtering you should set independentFiltering=FALSE.
"regression slope of 0.81 and correlation of 0.98"
Right so, while not identical, the two methods are obviously giving similar results. The methods are not identical, obviously, and the expanded design is preferable for shrinkage of LFC, as the results will be the same regardless of the choice of reference level for factors.
If I put in alpha=0.1, and build a contingency table you see the two methods largely agree on a 10% FDR set:
Anyway, for smaller correlations or regression slopes that you find between these two methods, it could be that the LFC do not show strong evidence of being larger or smaller than 0, and therefore you're just picking up differences in the methods because there is no real signal there.
Updated code to use
independent Filtering=FALSE instead of
alpha=1
NB: The actual code for fitting the data used
independentFiltering=FALSE
andalpha=1
.