Hello, my name is Alex Martinez and I am currently a graduate student at Purdue University. I am using edgeR to conduct differential gene expression for a project wherein I'd like to compare 2 treatment groups (control versus treatment) across 3 population groups (lm, lc, and ct). After reading through the edgeR user guide, I decided to opt for the Blocking method (section 3.4.2) and initially tested for DGE with all three populations combined in the model using the design matrix:
design <- model.matrix(~population+treatment) head(design) (Intercept) lc lm trtmntTrt ctc_1_RSEM 1 0 0 0 ctc_2_RSEM 1 0 0 0 ctc_3_RSEM 1 0 0 0 ctt_1_RSEM 1 0 0 1 ctt_2_RSEM 1 0 0 1 ctt_3_RSEM 1 0 0 1
and then performing the likelihood ratio test coefficients associated with the treatment variable.
lrt <- glmLRT(fit, coef=4)
The list of DEGs was ~50 genes in length. Next, I decided to test for DEGs between control and treatment individuals for each population separately to see if the individual lists roughly corroborated the list of DEGs in the Blocking model. The results were surprising: one population had over 300 DEGs while the other two had fewer than 40 each. Additionally, there were only a small number of genes across the individual population lists (~10) that were present in the Blocking model DEG list.
My questions are:
(a) Is there a good statistical reason to opt for the Blocking method over comparing the three populations separately?
(b) Why might the overlap between individual population DEG lists and the combined Blocking DEG list be so small?
Any thoughts or insight regarding these questions would be greatly appreciated! Thanks in advance.