Hello everyone,
I have the experimental data involving different cell lines treated with either DMSO (vehicle control) or 2 individual drugs as well as their combination - each sample in 3 replicates. I'm interested in comparing DEG after treatment with individual drugs and their combination within and across cell lines. I am going through various guides and vignettes, but still finding the matrix design very complex. I have come with the following design:
Table: sampleInfo
cell_line | cell_type | cell_group | treatment | treat1 | treat2 | replicate |
---|---|---|---|---|---|---|
A | cellLine | A_cellLine | DMSO | NO | NO | 1 |
A | cellLine | A_cellLine | DRUG1 | YES | NO | 1 |
A | cellLine | A_cellLine | DRUG2 | NO | YES | 1 |
A | cellLine | A_cellLine | COMBI | YES | YES | 1 |
A | cellLine | A_cellLine | DMSO | NO | NO | 2 |
A | cellLine | A_cellLine | DRUG1 | YES | NO | 2 |
A | cellLine | A_cellLine | DRUG2 | NO | YES | 2 |
A | cellLine | A_cellLine | COMBI | YES | YES | 2 |
A | cellLine | A_cellLine | DMSO | NO | NO | 3 |
A | cellLine | A_cellLine | DRUG1 | YES | NO | 3 |
A | cellLine | A_cellLine | DRUG2 | NO | YES | 3 |
A | cellLine | A_cellLine | COMBI | YES | YES | 3 |
B | cellLine | B_cellLine | DMSO | NO | NO | 1 |
B | cellLine | B_cellLine | DRUG1 | YES | NO | 1 |
B | cellLine | B_cellLine | DRUG2 | NO | YES | 1 |
B | cellLine | B_cellLine | COMBI | YES | YES | 1 |
B | cellLine | B_cellLine | DMSO | NO | NO | 2 |
B | cellLine | B_cellLine | DRUG1 | YES | NO | 2 |
B | cellLine | B_cellLine | DRUG2 | NO | YES | 2 |
B | cellLine | B_cellLine | COMBI | YES | YES | 2 |
B | cellLine | B_cellLine | DMSO | NO | NO | 3 |
B | cellLine | B_cellLine | DRUG1 | YES | NO | 3 |
B | cellLine | B_cellLine | DRUG2 | NO | YES | 3 |
B | cellLine | B_cellLine | COMBI | YES | YES | 3 |
C | primaryCells | C_primaryCells | DMSO | NO | NO | 1 |
C | primaryCells | C_primaryCells | DRUG1 | YES | NO | 1 |
C | primaryCells | C_primaryCells | DRUG2 | NO | YES | 1 |
C | primaryCells | C_primaryCells | COMBI | YES | YES | 1 |
C | primaryCells | C_primaryCells | DMSO | NO | NO | 2 |
C | primaryCells | C_primaryCells | DRUG1 | YES | NO | 2 |
C | primaryCells | C_primaryCells | DRUG2 | NO | YES | 2 |
C | primaryCells | C_primaryCells | COMBI | YES | YES | 2 |
C | primaryCells | C_primaryCells | DMSO | NO | NO | 3 |
C | primaryCells | C_primaryCells | DRUG1 | YES | NO | 3 |
C | primaryCells | C_primaryCells | DRUG2 | NO | YES | 3 |
C | primaryCells | C_primaryCells | COMBI | YES | YES | 3 |
D | primaryCells | D_primaryCells | DMSO | NO | NO | 1 |
D | primaryCells | D_primaryCells | DRUG1 | YES | NO | 1 |
D | primaryCells | D_primaryCells | DRUG2 | NO | YES | 1 |
D | primaryCells | D_primaryCells | COMBI | YES | YES | 1 |
D | primaryCells | D_primaryCells | DMSO | NO | NO | 2 |
D | primaryCells | D_primaryCells | DRUG1 | YES | NO | 2 |
D | primaryCells | D_primaryCells | DRUG2 | NO | YES | 2 |
D | primaryCells | D_primaryCells | COMBI | YES | YES | 2 |
D | primaryCells | D_primaryCells | DMSO | NO | NO | 3 |
D | primaryCells | D_primaryCells | DRUG1 | YES | NO | 3 |
D | primaryCells | D_primaryCells | DRUG2 | NO | YES | 3 |
D | primaryCells | D_primaryCells | COMBI | YES | YES | 3 |
Matrix design:
design <- model.matrix(~0+cell_group+(treat1*treat2), data = sampleInfo)
design
cell_groupA_cellLine cell_groupB_cellLine cell_groupC_primaryCells cell_groupD_primaryCells treat1YES treat2YES
1 1 0 0 0 0 0
2 1 0 0 0 1 0
3 1 0 0 0 0 1
4 1 0 0 0 1 1
5 1 0 0 0 0 0
6 1 0 0 0 1 0
7 1 0 0 0 0 1
8 1 0 0 0 1 1
9 1 0 0 0 0 0
10 1 0 0 0 1 0
11 1 0 0 0 0 1
12 1 0 0 0 1 1
13 0 1 0 0 0 0
14 0 1 0 0 1 0
15 0 1 0 0 0 1
16 0 1 0 0 1 1
17 0 1 0 0 0 0
18 0 1 0 0 1 0
19 0 1 0 0 0 1
20 0 1 0 0 1 1
21 0 1 0 0 0 0
22 0 1 0 0 1 0
23 0 1 0 0 0 1
24 0 1 0 0 1 1
25 0 0 1 0 0 0
26 0 0 1 0 1 0
27 0 0 1 0 0 1
28 0 0 1 0 1 1
29 0 0 1 0 0 0
30 0 0 1 0 1 0
31 0 0 1 0 0 1
32 0 0 1 0 1 1
33 0 0 1 0 0 0
34 0 0 1 0 1 0
35 0 0 1 0 0 1
36 0 0 1 0 1 1
37 0 0 0 1 0 0
38 0 0 0 1 1 0
39 0 0 0 1 0 1
40 0 0 0 1 1 1
41 0 0 0 1 0 0
42 0 0 0 1 1 0
43 0 0 0 1 0 1
44 0 0 0 1 1 1
45 0 0 0 1 0 0
46 0 0 0 1 1 0
47 0 0 0 1 0 1
48 0 0 0 1 1 1
treat1YES:treat2YES
1 0
2 0
3 0
4 1
5 0
6 0
7 0
8 1
9 0
10 0
11 0
12 1
13 0
14 0
15 0
16 1
17 0
18 0
19 0
20 1
21 0
22 0
23 0
24 1
25 0
26 0
27 0
28 1
29 0
30 0
31 0
32 1
33 0
34 0
35 0
36 1
37 0
38 0
39 0
40 1
41 0
42 0
43 0
44 1
45 0
46 0
47 0
48 1
attr(,"assign")
[1] 1 1 1 1 2 3 4
attr(,"contrasts")
attr(,"contrasts")$cell_group
[1] "contr.treatment"
attr(,"contrasts")$treat1
[1] "contr.treatment"
attr(,"contrasts")$treat2
[1] "contr.treatment"
# Data fitting
y <- calcNormFactors(y)
y <- estimateDisp(y, design)
fit <- glmFit(y, design)
However, I am struggling with the contrast design to incorporate the drug combination information. Any help would be very appreciated.
Thank you!
Do you want drug results separately for each cell line? Or do you want drug effects averaged over all the cell lines? I'd have expected you would want the former but your design matrix is doing the latter.
What is different about the different replicates? Are samples in the same replicate more similar than samples in different replicates?
My primary goal is to identify drug effects separately for each cell line. But I'm also interested in the overall effects of drugs across all cell lines. Replicates represent biological replicates: 3 different cell passages following similar treatments. Hence samples from the same replicate are not similar compared to samples in different replicates. Rather, samples with particular treatment in a specific cell line are (expectedly) similar across different replicates.
Then that inter-replicate correlation should be included in the analysis.
Thank you for your very important advice. I do indeed check the MDS plot for inter-replicate correlation.