Hi,
I have a experimental setup with 2 different strains, 3 different time points, activation/non-activation and 2 replicates. I've been trying to design the model matrix and write the design formula for some time now, but I still get the "error: inv(): matrix appears to be singular" error.
Here is my experimental design matrix:
strain time replicates activation AS 0 1 minus AS 2.5 1 minus AS 2.5 1 plus AS 4.5 1 minus AS 4.5 1 plus AS 0 2 minus AS 2.5 2 minus AS 2.5 2 plus AS 4.5 2 minus AS 4.5 2 plus WT 0 1 minus WT 2.5 1 minus WT 2.5 1 plus WT 4.5 1 minus WT 4.5 1 plus WT 0 2 minus WT 2.5 2 minus WT 2.5 2 plus WT 4.5 2 minus WT 4.5 2 plus
my original idea for the model matrix was
mm <- model.matrix(~ strain + activation + replicates + time:activation + strain:activation + strain:time:activation, expDesign)
because I believe the strain, activation, and replicates could have effects themselves, but there are also interaction effects for activation with time and strain.
I know, the matrix is not full rank, because there are no samples for timepoint 0 with activation. So after reading the vignette section "Levels without samples" I thought I could drop the corresponding column in the model matrix and run DESeq2:
> w <- which(colSums(mm)==0) > mm <- mm[,-w] > dds = DESeq(deseq.matrix, full=mm, betaPrior=FALSE) estimating size factors estimating dispersions gene-wise dispersion estimates using supplied model matrix error: inv(): matrix seems singular Error: inv(): matrix seems singular In addition: Warning message: In fitNbinomGLMs(objectNZ[fitidx, , drop = FALSE], alpha_hat = alpha_hat[fitidx], : 114rows had non-positive estimates of variance for coefficients
But I still get the "matrix seems singular" error. Can someone tell me what I am doing wrong or how my design should look like?
Thanks in advance,
Carina
Hi,
this im my sessionInfo:
I am interested if there is a significant difference in WT and AS strains (without activation, so at 0h time point), but also in the differentially expressed genes upon activation over time in each of the strains.
This is a complex experimental design because it involves second-order interaction terms and a number of implicit assumptions (no difference between plus and minus samples at time=0, such that the plus samples at time=0 were not generated). If it seems confusing and you aren't sure how to interpret things, you will need to find a local statistician who can help explain things. It really deserves a face-to-face meeting to go over the interpretations of terms here. Another complication is that you can't use fixed effects to account for the correlation within replicate within strain, and then simultaneously compare across strain, because these are collinear terms. You can account for replicate correlation with other software packages, such as the duplicateCorrelation function in limma, but DESeq2 does not have this kind of functionality. So my solution below doesn't account for correlation within replicate within strain.
You'll want to make sure "WT" is the reference level of strain (see vignette), "plus" is the reference level for activation, and that time is a factor --- otherwise you are assuming constant changes across time which for many genes could not be the case (both saturation or up-down/down-up expression can occur).
I would use a design of ~ time + strain + activation:time + activation:strain:time, but then you need to remove a number of terms, because of the peculiarities of your experimental design. I would keep the terms: (Intercept), time2.5, time4.5, strainAS, time2.5:activationplus, time4.5:activationplus, time2.5:strainAS:activationplus, time4.5:strainAS:activationplus. You can then provide the model matrix to the 'full' argument of DESeq() as you were before. This will be a full rank matrix.
The strainAS term is the difference AS vs WT without activation. You can pull this out with results and name="strainAS".
The DE genes upon activation for WT are time2.5:activationplus and time4.5:activationplus and again you would pull these out, e.g. for time=2.5, with name="time2.5:activationplus".
The DE genes upon activation for AS are the above terms plus the interaction of strain, time and activation. You would pull this out, e.g. for time=2.5, using results with contrast=list(c("time2.5:activationplus", "time2.5:strainAS:activationplus")).
Again, I'd recommend speaking to someone local who can help with further questions and writing up the interpretation if things are not clear.
Hi Michael,
thank you very much for you informative answer.
By now, I correlated replicate counts as they are very similar, there is very like no replicate effect, so I decided to leave out the replicate term. This also fits with your answer. I will try to make my design as you suggested.
Thanks again,
Best, Carina