Question

RNAseq analysis with edgeR-interaction terms and comparison within and between groups

0

Entering edit mode

erwagner2 • 0

@erwagner2-23144

Last seen 4.1 years ago

I am struggling with creating the appropriate model matrix for the type of comparisons I want to make. The data I have are from four different yeast strains, two conditions, and three replicates (total of 24 samples) (ex: Y184-YPD-rep1, Y184-NaCl-rep1, etc.). I want to make comparisons between the two conditions within a strain and make comparisons between the strains in each condition. I need to account for a strain-condition interaction as well as their additive effect in my model, so my first design was:

design.1<-model.matrix(~0 + strain*condition + rep)

but then the column names for this design are:

colnames(design.1) [1] "strainbcy1" "strainira2" "strainira2bcy1" "strainY184"
[5] "conditionYPD" "rep2" "rep3" "strainira2:conditionYPD"
[9] "strainira2bcy1:conditionYPD" "strainY184:conditionYPD"

which to me makes it seem like the NaCl condition isn't there, and I can't make a comparison with it?

So I've also tried this model, which accounts for only the interaction between strain and condition:

design.2<-model.matrix(~0 + strain:condition + rep)

and the column names makes it seem like everything is being accounted for

colnames(design.2) [1] "rep1" "rep2" "rep3"
[4] "strainbcy1:conditionNaCl" "strainira2:conditionNaCl" "strainira2bcy1:conditionNaCl" [7] "strainY184:conditionNaCl" "strainbcy1:conditionYPD" "strainira2:conditionYPD"
[10] "strainira2bcy1:conditionYPD" "strainY184:conditionYPD"

But then moving on to modeling the dispersion with design.2, I get an error that I can't really figure out how to fix:

> my.yglm<-estimateDisp(my.dge, design.2)
Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05,  : 
  Design matrix not of full rank.  The following coefficients not estimable:
 strainY184:conditionYPD

I've been messing around with a bunch other models, but nothing gives me the coefficient names I want in order to set the contrasts I want.

In another design, I use a grouping factor that combines the strain and condition into one (strain_condition -> listed like Y184_YPD) so I'm not considering the strain and condition separately, but I'm worried that it's not really accounting for the interaction I expect:

design.3<-model.matrix(~0 + strain_condition+rep) colnames(design.3) [1] "strain_conditionbcy1_NaCl" "strain_conditionbcy1_YPD" "strain_conditionira2_NaCl" "strain_conditionira2_YPD" "strain_conditionira2bcy1_NaCl" "strain_conditionira2bcy1_YPD" [7] "strain_conditionY184_NaCl" "strain_conditionY184_YPD" "rep2" "rep3"

Any advice on where to go from here would be great!

edger • 1.2k views

ADD COMMENT • link updated 4.1 years ago by Gordon Smyth 50k • written 4.1 years ago by erwagner2 • 0

0

Entering edit mode

You cross-posted this on Biostars: https://www.biostars.org/p/428073/

In future, when you do this, could you please mention this clearly in your question so that neither community duplicates efforts. We are volunteers here, after all. Thank you!

ADD REPLY • link 4.1 years ago Kevin Blighe ★ 3.9k

score 1 · Answer 1 · 2020-03-21

1

Entering edit mode

Yunshun Chen ▴ 840

@yunshun-chen-5451

Last seen 6 weeks ago

Australia

It makes no sense to include rep in your design. In your case, the easiest and the most straightforward way to construct the design matrix is to use a one-way layout. A good example is given in the following F1000Research article: https://f1000research.com/articles/5-1438

ADD COMMENT • link 4.1 years ago Yunshun Chen ▴ 840