Question: EdgeR: With two treatments, is it equivalent to split into two different models vs. making the relevant pairwise comparisons?
0
gravatar for sdalin
8 weeks ago by
sdalin0
sdalin0 wrote:

I'm very new to edgeR, so this may be a silly question... I have a CRISPR screen with the following design:

  + conditioned media - conditioned media
+ drug d.cm d.ncm
- drug nd.cm nd.ncm

There are three biological replicates of each of the four conditions.  Two of those replicates were sequenced at a different time than the other replicate, and there is a significant batch effect from the different sequencing runs.

I'm mainly interested in how the conditioned media affects which guides are enriched/depleted after treatment with drug.  I've thought of two ways to do this analysis:

1) Split my data into two sets, one for +cm and one for -cm.  So my samples would be:

cm$samples$group
[1] Conditioned.Media                  Conditioned.Media                  Conditioned.Media                 
[4] Drug.10nM.Conditioned.Media Drug.10nM.Conditioned.Media Drug.10nM.Conditioned.Media


ncm$samples$group
[1] Untreated        Untreated        Untreated        Drug.10nM Drug.10nM Drug.10nM

 

Then my design matrices are as follows (setting seqDate as intercept to account for batch effect):

desCM <- model.matrix(~seqDate + group, data=cm)
desNCM <- model.matrix(~seqDate + group, data=ncm)

 

desCM

   (Intercept) seqDatet2 groupDrug.10nM.Conditioned.Media
7            1         1                                       0
1            1         1                                       0
11           1         0                                       0
9            1         1                                       1
3            1         1                                       1
13           1         0                                       1

 

desNCM

   (Intercept) seqDatet2 groupDrug.10nM
6            1         1                     0
17           1         0                     0
10           1         0                     0
8            1         1                     1
2            1         1                     1
12           1         0                     1

and I make the following contrasts (which, if I understand correctly, should compare d.cm - nd.cm for the first one and d.ncm - nd.ncm for the second one):

lrtCM = glmLRT(fitCM, coef=3)
lrtNCM = glmLRT(fitNCM, coef=3)

 

2) Keep everything together and only make the +/- drug contrasts in the context of + or - cm.  Here my samples are:

y$samples$group

 [1] Untreated                          Untreated                          Untreated                         
 [4] Conditioned.Media                  Conditioned.Media                  Conditioned.Media                 
 [7] Drug.10nM.Conditioned.Media Drug.10nM.Conditioned.Media Drug.10nM.Conditioned.Media
[10] Drug.10nM                   Drug.10nM                   Drug.10nM   

my design matrix is:

design <- model.matrix(~seqDate + group, data = y$samples)

   (Intercept) seqDatet2 groupDrug.10nM groupDrug.10nM.Conditioned.Media groupConditioned.Media
6            1         1                     0                                       0                      0
17           1         0                     0                                       0                      0
10           1         0                     0                                       0                      0
7            1         1                     0                                       0                      1
1            1         1                     0                                       0                      1
11           1         0                     0                                       0                      1
9            1         1                     0                                       1                      0
3            1         1                     0                                       1                      0
13           1         0                     0                                       1                      0
8            1         1                     1                                       0                      0
2            1         1                     1                                       0                      0
12           1         0                     1                                       0                      0

and I make the following contrasts (the first should compare d.ncm - nd.ncm, and the second should compare d.cm - nd.cm):

lrtNCM <- glmLRT(fit, coef=3)
lrtCM <- glmLRT(fit, contrast=c(0,0,0,1,-1))

 

I feel like these should be equivalent, but based on the topTags, they are not and I don't understand why. 

Why are they different?  Which is more correct to understand how the conditioned media changes the response to the drug?

ADD COMMENTlink modified 8 weeks ago by Gordon Smyth36k • written 8 weeks ago by sdalin0
Answer: EdgeR: With two treatments, is it equivalent to split into two different models
2
gravatar for Gordon Smyth
8 weeks ago by
Gordon Smyth36k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth36k wrote:

The two approaches are not the same because in (1) you estimate two different dispersion parameters for each gene, one for CM and one for NCM, where in (2) you estimate only one dispersion parameter for each gene.

Unless there is a massive difference in variability between the CM and NCM sets, we recommend the second approach because the pooled dispersion parameter is more reliable, being based on more samples.

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Gordon Smyth36k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 321 users visited in the last hour