I am struggling with creating the appropriate model matrix for the type of comparisons I want to make. The data I have are from four different yeast strains, two conditions, and three replicates (total of 24 samples) (ex: Y184-YPD-rep1, Y184-NaCl-rep1, etc.). I want to make comparisons between the two conditions within a strain and make comparisons between the strains in each condition. I need to account for a strain-condition interaction as well as their additive effect in my model, so my first design was:
design.1<-model.matrix(~0 + strain*condition + rep)
but then the column names for this design are:
colnames(design.1) [1] "strainbcy1" "strainira2" "strainira2bcy1" "strainY184"
[5] "conditionYPD" "rep2" "rep3" "strainira2:conditionYPD"
[9] "strainira2bcy1:conditionYPD" "strainY184:conditionYPD"
which to me makes it seem like the NaCl condition isn't there, and I can't make a comparison with it?
So I've also tried this model, which accounts for only the interaction between strain and condition:
design.2<-model.matrix(~0 + strain:condition + rep)
and the column names makes it seem like everything is being accounted for
colnames(design.2) [1] "rep1" "rep2" "rep3"
[4] "strainbcy1:conditionNaCl" "strainira2:conditionNaCl" "strainira2bcy1:conditionNaCl" [7] "strainY184:conditionNaCl" "strainbcy1:conditionYPD" "strainira2:conditionYPD"
[10] "strainira2bcy1:conditionYPD" "strainY184:conditionYPD"
But then moving on to modeling the dispersion with design.2, I get an error that I can't really figure out how to fix:
> my.yglm<-estimateDisp(my.dge, design.2)
Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05, :
Design matrix not of full rank. The following coefficients not estimable:
strainY184:conditionYPD
I've been messing around with a bunch other models, but nothing gives me the coefficient names I want in order to set the contrasts I want.
In another design, I use a grouping factor that combines the strain and condition into one (strain_condition -> listed like Y184_YPD
) so I'm not considering the strain and condition separately, but I'm worried that it's not really accounting for the interaction I expect:
design.3<-model.matrix(~0 + strain_condition+rep) colnames(design.3) [1] "strain_conditionbcy1_NaCl" "strain_conditionbcy1_YPD" "strain_conditionira2_NaCl" "strain_conditionira2_YPD" "strain_conditionira2bcy1_NaCl" "strain_conditionira2bcy1_YPD" [7] "strain_conditionY184_NaCl" "strain_conditionY184_YPD" "rep2" "rep3"
Any advice on where to go from here would be great!
You cross-posted this on Biostars: https://www.biostars.org/p/428073/
In future, when you do this, could you please mention this clearly in your question so that neither community duplicates efforts. We are volunteers here, after all. Thank you!