Dear Prof. Love:
There have been numerous questions about “not full rank error” in DESeq2. The vignette addresses the issue and I understand when the error is generated. What I would like to find out is why you decided to include that feature in the first place. The point is that both SAS (PROC GLM, MIXED, GENMOD with Negative Binomial response) and R (lm) are quite comfortable with incomplete rank designs.
When one factor is perfectly confounded with another (called “Linear combinations” in the vignette) I suppose it was a good idea to generate an error, but, strictly speaking, SAS and R will produce an answer even in that case (R will produce some missing values and a warning “not defined because of singularities”).
Was your intention just to force the user to “consult a statistician” or there were some estimation difficulties when fitting the model with an incomplete rank matrix?
Regards, Nik Tuzov
I was almost sure that some estimation difficulties were the reason because catching statistical design errors is well outside DESeq2 mandate. It would be better if that feature were optional, not hard coded. However, if the user were allowed to use any design matrix (including those in GLM coding which you call EMM) it may have some side effects in lfcShrink().
Catching design errors is equally as important as estimating the parameters in my opinion. Adding support for non full rank X would involve additional complexity for little to no gain.