Question

edgeR, 3 x 2 design with a batch effect and contrasts

0

Entering edit mode

JKim • 0

@4035f8c1

Last seen 6 hours ago

United States

Hello,

I apologize that similar questions were asked many times before, yet I'm asking again. My design is a 3 x 2 design with a batch effect ( samples were not processed on the same day). I came across this post and am going to follow Dr. Smyth's suggestion (0 ~ + batch + group). I just wanted to know if my approach of making contrasts is correct.

My meta_data. I combined two factors (genotype and treat) into one factor, group.

         sample_name batch treat genotype    group
sample1      sample1    B1 untrt       WT WT_untrt
sample2      sample2    B1   trt       WT   WT_trt
sample3      sample3    B1 untrt       MR MR_untrt
sample4      sample4    B1   trt       MR   MR_trt
sample5      sample5    B1 untrt       GD GD_untrt
sample6      sample6    B1   trt       GD   GD_trt
sample7      sample7    B2 untrt       WT WT_untrt
sample8      sample8    B2   trt       WT   WT_trt
sample9      sample9    B2 untrt       MR MR_untrt
sample10    sample10    B2   trt       MR   MR_trt
sample11    sample11    B2 untrt       GD GD_untrt
sample12    sample12    B2   trt       GD   GD_trt

My design matrix

         batchB1 batchB2 groupGD_untrt groupMR_trt groupMR_untrt groupWT_trt groupWT_untrt
sample1        1       0             0           0             0           0             1
sample2        1       0             0           0             0           1             0
sample3        1       0             0           0             1           0             0
sample4        1       0             0           1             0           0             0
sample5        1       0             1           0             0           0             0
sample6        1       0             0           0             0           0             0
sample7        0       1             0           0             0           0             1
sample8        0       1             0           0             0           1             0
sample9        0       1             0           0             1           0             0
sample10       0       1             0           1             0           0             0
sample11       0       1             1           0             0           0             0
sample12       0       1             0           0             0           0             0

My contrasts

contrs <- makeContrasts(
  trt_vs_untrt_within_WT = groupWT_trt - groupWT_untrt,
  GD_vs_WT_within_untrt  = groupGD_untrt - groupWT_untrt,
  levels=colnames(design)
)

res_trt_within_WT <- glmQLFTest(fit, contrast=contrs[, "trt_vs_untrt_within_WT"])
res_GD_vs_WT_within_untrt = glmQLFTest(fit, contrast=contrs[, "GD_vs_WT_within_untrt"])

Q1. I'd like to know treatment effect within WT while accounting for the batch effect.

Is my res_trt_within_WT correct?

Q2. genotype effect (GD vs WT) within untreated

Is my res_GD_vs_WT_within_untrt correct?

I have gone through edgeR user guide as well as A guide to creating design matrices for gene expression experiments, yet I am asking these questions. I'm sorry...

unsolicited info:

I did PCA and observed the batch effect - PC1 separated batch 1 samples and batch 2 samples with 70% variance explained.

design edgeR contrasts • 1.2k views

ADD COMMENT • link updated 22 months ago by Gordon Smyth 53k • written 23 months ago by JKim • 0

score 1 · Accepted Answer · 2024-03-18

1

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

If you include 0+ in the model formula, then the order of the batch and treatment factors in the model becomes important. You need 0+group+batch rather than 0+batch+group.

See also https://www.biostars.org/p/9588610

ADD COMMENT • link 22 months ago Gordon Smyth 53k

0

Entering edit mode

Thank you. I think I found why the order matters.

Although an intercept-free design matrix has been coded using the 0+ notation, the intercept is only excluded from the first factor that is listed within the model.matrix function. In other words, the second and third factors added to the model.matrix function are parameterised as though there is an intercept term. This is why we place the factor of interest first as it simplifies the subsequent code for the comparisons of interest, even though a different order of factors added give equivalent models with variations in parameterisation.

ADD REPLY • link 22 months ago JKim • 0

1

Entering edit mode

Yes, that is a quote from A guide to creating design matrices for gene expression experiments.

ADD REPLY • link 22 months ago Gordon Smyth 53k