Question

EdgeR: With two treatments, is it equivalent to split into two different models vs. making the relevant pairwise comparisons?

0

Entering edit mode

sdalin • 0

@sdalin-15843

Last seen 5.5 years ago

I'm very new to edgeR, so this may be a silly question... I have a CRISPR screen with the following design:

	+ conditioned media	- conditioned media
+ drug	d.cm	d.ncm
- drug	nd.cm	nd.ncm

There are three biological replicates of each of the four conditions. Two of those replicates were sequenced at a different time than the other replicate, and there is a significant batch effect from the different sequencing runs.

I'm mainly interested in how the conditioned media affects which guides are enriched/depleted after treatment with drug. I've thought of two ways to do this analysis:

1) Split my data into two sets, one for +cm and one for -cm. So my samples would be:

cm$samples$group
[1] Conditioned.Media                  Conditioned.Media                  Conditioned.Media                 
[4] Drug.10nM.Conditioned.Media Drug.10nM.Conditioned.Media Drug.10nM.Conditioned.Media


ncm$samples$group
[1] Untreated        Untreated        Untreated        Drug.10nM Drug.10nM Drug.10nM

Then my design matrices are as follows (setting seqDate as intercept to account for batch effect):

desCM <- model.matrix(~seqDate + group, data=cm)
desNCM <- model.matrix(~seqDate + group, data=ncm)

desCM

   (Intercept) seqDatet2 groupDrug.10nM.Conditioned.Media
7            1         1                                       0
1            1         1                                       0
11           1         0                                       0
9            1         1                                       1
3            1         1                                       1
13           1         0                                       1

desNCM

   (Intercept) seqDatet2 groupDrug.10nM
6            1         1                     0
17           1         0                     0
10           1         0                     0
8            1         1                     1
2            1         1                     1
12           1         0                     1

and I make the following contrasts (which, if I understand correctly, should compare d.cm - nd.cm for the first one and d.ncm - nd.ncm for the second one):

lrtCM = glmLRT(fitCM, coef=3)
lrtNCM = glmLRT(fitNCM, coef=3)

2) Keep everything together and only make the +/- drug contrasts in the context of + or - cm. Here my samples are:

y$samples$group

 [1] Untreated                          Untreated                          Untreated                         
 [4] Conditioned.Media                  Conditioned.Media                  Conditioned.Media                 
 [7] Drug.10nM.Conditioned.Media Drug.10nM.Conditioned.Media Drug.10nM.Conditioned.Media
[10] Drug.10nM                   Drug.10nM                   Drug.10nM

my design matrix is:

design <- model.matrix(~seqDate + group, data = y$samples)

   (Intercept) seqDatet2 groupDrug.10nM groupDrug.10nM.Conditioned.Media groupConditioned.Media
6            1         1                     0                                       0                      0
17           1         0                     0                                       0                      0
10           1         0                     0                                       0                      0
7            1         1                     0                                       0                      1
1            1         1                     0                                       0                      1
11           1         0                     0                                       0                      1
9            1         1                     0                                       1                      0
3            1         1                     0                                       1                      0
13           1         0                     0                                       1                      0
8            1         1                     1                                       0                      0
2            1         1                     1                                       0                      0
12           1         0                     1                                       0                      0

and I make the following contrasts (the first should compare d.ncm - nd.ncm, and the second should compare d.cm - nd.cm):

lrtNCM <- glmLRT(fit, coef=3)
lrtCM <- glmLRT(fit, contrast=c(0,0,0,1,-1))

I feel like these should be equivalent, but based on the topTags, they are not and I don't understand why.

Why are they different? Which is more correct to understand how the conditioned media changes the response to the drug?

edgeR design matrix contrast • 905 views

ADD COMMENT • link updated 5.4 years ago by Gordon Smyth 50k • written 5.4 years ago by sdalin • 0

score 2 · Answer 1 · 2018-11-21

The two approaches are not the same because in (1) you estimate two different dispersion parameters for each gene, one for CM and one for NCM, where in (2) you estimate only one dispersion parameter for each gene.

Unless there is a massive difference in variability between the CM and NCM sets, we recommend the second approach because the pooled dispersion parameter is more reliable, being based on more samples.