Question

edgeR - comparing treatment groups across multiple populations

0

Entering edit mode

mart1139 • 0

@mart1139-14954

Last seen 4.6 years ago

Purdue University

Hello, my name is Alex Martinez and I am currently a graduate student at Purdue University. I am using edgeR to conduct differential gene expression for a project wherein I'd like to compare 2 treatment groups (control versus treatment) across 3 population groups (lm, lc, and ct). After reading through the edgeR user guide, I decided to opt for the Blocking method (section 3.4.2) and initially tested for DGE with all three populations combined in the model using the design matrix:

design <- model.matrix(~population+treatment)

head(design)
           (Intercept) lc lm trtmntTrt
ctc_1_RSEM           1  0  0         0
ctc_2_RSEM           1  0  0         0
ctc_3_RSEM           1  0  0         0
ctt_1_RSEM           1  0  0         1
ctt_2_RSEM           1  0  0         1
ctt_3_RSEM           1  0  0         1

and then performing the likelihood ratio test coefficients associated with the treatment variable.

lrt <- glmLRT(fit, coef=4)

The list of DEGs was ~50 genes in length. Next, I decided to test for DEGs between control and treatment individuals for each population separately to see if the individual lists roughly corroborated the list of DEGs in the Blocking model. The results were surprising: one population had over 300 DEGs while the other two had fewer than 40 each. Additionally, there were only a small number of genes across the individual population lists (~10) that were present in the Blocking model DEG list.

My questions are:

(a) Is there a good statistical reason to opt for the Blocking method over comparing the three populations separately?

(b) Why might the overlap between individual population DEG lists and the combined Blocking DEG list be so small?

Any thoughts or insight regarding these questions would be greatly appreciated! Thanks in advance.

Alex

rnaseq edger • 1.7k views

ADD COMMENT • link updated 6.2 years ago by Aaron Lun ★ 28k • written 6.2 years ago by mart1139 • 0

score 2 · Accepted Answer · 2018-02-05

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 14 hours ago

The city by the bay

Your two approaches are making different assumptions. In particular, your blocking model assumes that the effect of treatment is the same in each population; this is not the case when you perform separate analyses of each population. It sounds as if the treatment effect does differ between populations, in which case blocking would yield a poorly-fitting model. This would manifest as inflated variances and reduced power to detect DEGs.

A more valid comparison would be to set up a one-way layout, for example:

group <- paste0(population, treatment)
design <- model.matrix(~0 + group)

... and then compare treated/untreated within each population using makeContrasts. This is closer to what you are doing with the separate analyses. The added benefit of using all samples together is that you get more information to estimate the dispersion, which improves power for hypothesis testing; even if the extra samples aren't involved in your contrasts of interest.

ADD COMMENT • link 6.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for the quick response Aaron. I was definitely concerned that I might be missing out on DEGs by comparing each population in a separate model and this solution puts those worries at ease. When making the contrast to control all populations, I was planning on doing something like the following ("_c" refers to control individuals and "_t" refers to treatment individuals):

design
      ct_c ct_t lc_c lc_t lm_c lm_t
ctc_1    1    0    0    0    0    0
ctc_2    1    0    0    0    0    0
ctc_3    1    0    0    0    0    0
ctt_1    0    1    0    0    0    0
ctt_2    0    1    0    0    0    0
ctt_3    0    1    0    0    0    0
lcc_1    0    0    1    0    0    0
lcc_2    0    0    1    0    0    0
lcc_3    0    0    1    0    0    0
lct_1    0    0    0    1    0    0
lct_2    0    0    0    1    0    0
lct_3    0    0    0    1    0    0
lmc_1    0    0    0    0    1    0
lmc_2    0    0    0    0    1    0
lmc_3    0    0    0    0    1    0
lmt_1    0    0    0    0    0    1
lmt_2    0    0    0    0    0    1
lmt_3    0    0    0    0    0    1

allpops_contrast <- c(-1,1,-1,1,-1,1)

lrt <- glmLRT(fit, contrast = allpops_contrast)

I was just hoping to clarify that my understanding of setting up the contrasts to test differences between treatment groups across populations is correct. Thank you again for the insight.

Alex

ADD REPLY • link 6.2 years ago mart1139 • 0

1

Entering edit mode

You need to divide allpops_contrast by 3 to get the average treatment log-fold change across populations.

ADD REPLY • link 6.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Great, that makes sense. I appreciate your help in clarifying this issue!

Best,

Alex.

ADD REPLY • link 6.2 years ago mart1139 • 0