Question: edgeR - comparing treatment groups across multiple populations
0
22 months ago by
mart11390
Purdue University
mart11390 wrote:

Hello, my name is Alex Martinez and I am currently a graduate student at Purdue University. I am using edgeR to conduct differential gene expression for a project wherein I'd like to compare 2 treatment groups (control versus treatment) across 3 population groups (lmlc, and ct). After reading through the edgeR user guide, I decided to opt for the Blocking method (section 3.4.2) and initially tested for DGE with all three populations combined in the model using the design matrix:

design <- model.matrix(~population+treatment)

(Intercept) lc lm trtmntTrt
ctc_1_RSEM           1  0  0         0
ctc_2_RSEM           1  0  0         0
ctc_3_RSEM           1  0  0         0
ctt_1_RSEM           1  0  0         1
ctt_2_RSEM           1  0  0         1
ctt_3_RSEM           1  0  0         1

and then performing the likelihood ratio test coefficients associated with the treatment variable.

lrt <- glmLRT(fit, coef=4)

The list of DEGs was ~50 genes in length. Next, I decided to test for DEGs between control and treatment individuals for each population separately to see if the individual lists roughly corroborated the list of DEGs in the Blocking model. The results were surprising: one population had over 300 DEGs while the other two had fewer than 40 each. Additionally, there were only a small number of genes across the individual population lists (~10) that were present in the Blocking model DEG list.

My questions are:

(a) Is there a good statistical reason to opt for the Blocking method over comparing the three populations separately?

(b) Why might the overlap between individual population DEG lists and the combined Blocking DEG list be so small?

Any thoughts or insight regarding these questions would be greatly appreciated! Thanks in advance.

Alex

rnaseq edger • 361 views
modified 22 months ago by Aaron Lun25k • written 22 months ago by mart11390
Answer: edgeR - comparing treatment groups across multiple populations
2
22 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Your two approaches are making different assumptions. In particular, your blocking model assumes that the effect of treatment is the same in each population; this is not the case when you perform separate analyses of each population. It sounds as if the treatment effect does differ between populations, in which case blocking would yield a poorly-fitting model. This would manifest as inflated variances and reduced power to detect DEGs.

A more valid comparison would be to set up a one-way layout, for example:

group <- paste0(population, treatment)
design <- model.matrix(~0 + group)


... and then compare treated/untreated within each population using makeContrasts. This is closer to what you are doing with the separate analyses. The added benefit of using all samples together is that you get more information to estimate the dispersion, which improves power for hypothesis testing; even if the extra samples aren't involved in your contrasts of interest.

Thanks for the quick response Aaron. I was definitely concerned that I might be missing out on DEGs by comparing each population in a separate model and this solution puts those worries at ease. When making the contrast to control all populations, I was planning on doing something like the following ("_c" refers to control individuals and "_t" refers to treatment individuals)​:

design
ct_c ct_t lc_c lc_t lm_c lm_t
ctc_1    1    0    0    0    0    0
ctc_2    1    0    0    0    0    0
ctc_3    1    0    0    0    0    0
ctt_1    0    1    0    0    0    0
ctt_2    0    1    0    0    0    0
ctt_3    0    1    0    0    0    0
lcc_1    0    0    1    0    0    0
lcc_2    0    0    1    0    0    0
lcc_3    0    0    1    0    0    0
lct_1    0    0    0    1    0    0
lct_2    0    0    0    1    0    0
lct_3    0    0    0    1    0    0
lmc_1    0    0    0    0    1    0
lmc_2    0    0    0    0    1    0
lmc_3    0    0    0    0    1    0
lmt_1    0    0    0    0    0    1
lmt_2    0    0    0    0    0    1
lmt_3    0    0    0    0    0    1

allpops_contrast <- c(-1,1,-1,1,-1,1)

lrt <- glmLRT(fit, contrast = allpops_contrast)

I was just hoping to clarify that my understanding of setting up the contrasts to test differences between treatment groups across populations is correct. Thank you again for the insight.

Alex

1

You need to divide allpops_contrast by 3 to get the average treatment log-fold change across populations.

Great, that makes sense. I appreciate your help in clarifying this issue!

Best,

Alex.