**0**wrote:

Hi,

I have an RNA-seq experiment that is similar to section 3.5 in the edgeR user’s guide, i.e. a nested design, and I have used this approach to analyze my own data. Briefly, the experiment involves 60 RNA samples, corresponding to two groups of bacterial strains (commensal (C) and disease-causing (D)); each group consists of 15 different strains; either strain induced (IND) and in the control (CTR) condition. I am now a bit uncertain about how to interpret the design matrix I am using for this experiment when calculating differential expression and would very much appreciate some help. The data frame I am using is as follows:

**> data.frame(group, strain, condition)**

group strain condition

1 C 1 CTR

2 C 1 IND

3 C 2 CTR

4 C 2 IND

5 C 3 CTR

6 C 3 IND

7 C 4 CTR

8 C 4 IND

9 C 5 CTR

10 C 5 IND

11 C 6 CTR

12 C 6 IND

13 C 7 CTR

14 C 7 IND

15 C 8 CTR

16 C 8 IND

17 C 9 CTR

18 C 9 IND

19 C 10 CTR

20 C 10 IND

21 C 11 CTR

22 C 11 IND

23 C 12 CTR

24 C 12 IND

25 C 13 CTR

26 C 13 IND

27 C 14 CTR

28 C 14 IND

29 C 15 CTR

30 C 15 IND

31 D 1 CTR

32 D 1 IND

33 D 2 CTR

34 D 2 IND

35 D 3 CTR

36 D 3 IND

37 D 4 CTR

38 D 4 IND

39 D 5 CTR

40 D 5 IND

41 D 6 CTR

42 D 6 IND

43 D 7 CTR

44 D 7 IND

45 D 8 CTR

46 D 8 IND

47 D 9 CTR

48 D 9 IND

49 D 10 CTR

50 D 10 IND

51 D 11 CTR

52 D 11 IND

53 D 12 CTR

54 D 12 IND

55 D 13 CTR

56 D 13 IND

57 D 14 CTR

58 D 14 IND

59 D 15 CTR

60 D 15 IND

The design matrix (modeled after the example in 3.5 of the manual) has group as the main effect, with strain and condition nested within the groups:

**design <- model.matrix(~group+group:strain+group:condition)
colnames(design)**

[1] "(Intercept)" "groupD" "groupC:strain2"

[4] "groupD:strain2" "groupC:strain3" "groupD:strain3"

[7] "groupC:strain4" "groupD:strain4" "groupC:strain5"

[10] "groupD:strain5" "groupC:strain6" "groupD:strain6"

[13] "groupC:strain7" "groupD:strain7" "groupC:strain8"

[16] "groupD:strain8" "groupC:strain9" "groupD:strain9"

[19] "groupC:strain10" "groupD:strain10" "groupC:strain11"

[22] "groupD:strain11" "groupC:strain12" "groupD:strain12"

[25] "groupC:strain13" "groupD:strain13" "groupC:strain14"

[28] "groupD:strain14" "groupC:strain15" "groupD:strain15"

[31] "groupC:conditionIND" "groupD:conditionIND"

Fit GLM to each feature given calculated dispersions and design matrix:

**fit <- glmFit(d, design)**

Then calculate differential expression within groups using the following coefficients (please correct me if this is wrong):

1. Genes responding to induction in any group:

** DE1 <- glmLRT(fit, coef=31:32)**

2. Genes responding to induction in group C:

** DE2 <- glmLRT(fit, coef=31)**

3. Genes responding to induction in group D:

** DE3 <- glmLRT(fit, coef=32)**

I am uncertain about comparing between groups using this design matrix. If I understand correctly, according to the example in 3.5 I can compare between groups to find:.

4. Genes responding differently to induction in group D vs group C using this contrast:

** DE4 <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1))**

Can I also use this design matrix to compare between groups to find (and if so, using which coefficients/contrasts):

5. Genes responding differently to the control condition in group D vs group C?

6. Genes responding differently to any condition in group D vs group C?

Or would I need to modify the design matrix/do a separate analysis to compare between groups in this case?

Any help would be much appreciated.