RNAseq analysis with edgeR-interaction terms and comparison within and between groups
Entering edit mode
erwagner2 • 0
Last seen 3.5 years ago

I am struggling with creating the appropriate model matrix for the type of comparisons I want to make. The data I have are from four different yeast strains, two conditions, and three replicates (total of 24 samples) (ex: Y184-YPD-rep1, Y184-NaCl-rep1, etc.). I want to make comparisons between the two conditions within a strain and make comparisons between the strains in each condition. I need to account for a strain-condition interaction as well as their additive effect in my model, so my first design was:

design.1<-model.matrix(~0 + strain*condition + rep)

but then the column names for this design are:

colnames(design.1) [1] "strainbcy1" "strainira2" "strainira2bcy1" "strainY184"
[5] "conditionYPD" "rep2" "rep3" "strainira2:conditionYPD"
[9] "strainira2bcy1:conditionYPD" "strainY184:conditionYPD"

which to me makes it seem like the NaCl condition isn't there, and I can't make a comparison with it?

So I've also tried this model, which accounts for only the interaction between strain and condition:

design.2<-model.matrix(~0 + strain:condition + rep)

and the column names makes it seem like everything is being accounted for

colnames(design.2) [1] "rep1" "rep2" "rep3"
[4] "strainbcy1:conditionNaCl" "strainira2:conditionNaCl" "strainira2bcy1:conditionNaCl" [7] "strainY184:conditionNaCl" "strainbcy1:conditionYPD" "strainira2:conditionYPD"
[10] "strainira2bcy1:conditionYPD" "strainY184:conditionYPD"

But then moving on to modeling the dispersion with design.2, I get an error that I can't really figure out how to fix:

> my.yglm<-estimateDisp(my.dge, design.2)
Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05,  : 
  Design matrix not of full rank.  The following coefficients not estimable:

I've been messing around with a bunch other models, but nothing gives me the coefficient names I want in order to set the contrasts I want.

In another design, I use a grouping factor that combines the strain and condition into one (strain_condition -> listed like Y184_YPD) so I'm not considering the strain and condition separately, but I'm worried that it's not really accounting for the interaction I expect:

design.3<-model.matrix(~0 + strain_condition+rep) colnames(design.3) [1] "strain_conditionbcy1_NaCl" "strain_conditionbcy1_YPD" "strain_conditionira2_NaCl" "strain_conditionira2_YPD" "strain_conditionira2bcy1_NaCl" "strain_conditionira2bcy1_YPD" [7] "strain_conditionY184_NaCl" "strain_conditionY184_YPD" "rep2" "rep3"

Any advice on where to go from here would be great!

edger • 1.0k views
Entering edit mode

You cross-posted this on Biostars: https://www.biostars.org/p/428073/

In future, when you do this, could you please mention this clearly in your question so that neither community duplicates efforts. We are volunteers here, after all. Thank you!

Entering edit mode
Yunshun Chen ▴ 810
Last seen 9 days ago

It makes no sense to include rep in your design. In your case, the easiest and the most straightforward way to construct the design matrix is to use a one-way layout. A good example is given in the following F1000Research article: https://f1000research.com/articles/5-1438


Login before adding your answer.

Traffic: 589 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6