This question is related to an edge case of the linear model combination explained in the new DESeq2 tutorial here.
More details: We have 2 groups based on viral infection (infected vs uninfected). The infected samples have 2 different stains but the uninfected samples can't have strain info at all. Using a fake value for the uninfected samples (as below) will cause “Model matrix not full rank” due to linear combination
## DataFrame with 6 rows and 2 columns
## infection strain
## <factor> <factor>
## 1 1 A
## 2 1 A
## 3 1 B
## 4 1 B
## 5 2 C
## 6 2 C
Is there a solution for this situation? Can I use 2 fake values (A and B) for the control samples like nested models so that the comparison of infected vs uninfected groups will be corrected for the strain effect? Or we have a model of one variable (strain A, strain B and Control) but how can I call for the contrast that combine the 2 strains versus control?
Thank you
But this contrast is averaging across strains A and B, not correcting for the effect of the strain, right?
I don't know exactly what you mean by correcting for, but we can do the average fixed effect. DESeq2 cannot fit mixed effects models, see other support site posts on this topic.
Thank you for your reply. When I said "correcting for the effect of the strain", I mean measuring the effect of the viral infection above and beyond the effect of different strains. Something like what we do when we use GLM to adjust for batch effect or any confounding variable. I assume that averaging between the 2 groups will increase the variance and decrease the power of detecting the DEGs. I am not very familiar with fixed effect modeling but I think meta-analysis of testing each strain group against the control would be more suitable. Anyhow, thanks for the suggestion.