Dear all,
I'm performing a differential gene expression analysis using limma.
My study design is simple in that I only want to compare control and disease within three different groups, however, I want to control for 3 categorical covariates - site, sampling location and gender, and one continuous covariate - age. The contrasts of interest are:
CP_1 to CP_2: DEGs between control CP (CP_1) and diseased CP (CP_2)
C_1 to C_2: DEGs between control C (C_1) and diseased C (C_2)
N_1 to N_2: DEGs between control N (N_1) and diseased N (N_2)
Thus, '_1' represents a control, and '_2' represents a diseased case. I am not interested in genes associated with gender/age or any of the covariates, I just want to identify the disease effect whilst controlling for the covariates.
My targets file is follows:
> targets[1:10,]
   Sample_ID Group     Site Sampling_location Gender Age
1    emiasb1  CP_1      leg               S_2 Female  54
2    emiasb2  CP_1      leg               S_1 Female  74
3    emiasb3  CP_1      leg               S_1 Female  39
4    emiasb4  CP_1      leg               S_1 Female  57
5    emiasb5  CP_1      leg               S_1   Male  32
6    emiasb6  CP_1      leg               S_1   Male  32
7    emiasb7  CP_1 buttocks               S_3   Male  63
8    emiasb8  CP_1      leg               S_2   Male  46
9    emiasb9  CP_1      leg               S_3   Male  72
10  emiasb10  CP_1      leg               S_2   Male  58
# sample numbers
> table(targets$Group)
 C_1  C_2 CP_1 CP_2  N_1  N_2
136   57   87   27   87   30 
f <- factor(targets$Group)
  
# cell means model with sampling location, age, site, and gender as blocking factors
design <- model.matrix(~0 + f + Sampling_location + as.numeric(Age) + Site + Gender, targets)
head(design)
> head(design)
  fC_1 fC_2 fCP_1 fCP_2 fN_1 fN_2 Sampling_locationS_2 Sampling_locationS_3 Age Sitebuttocks Siteleg Sitethigh GenderMale
1    0    0     1     0    0    0                    1                    0  54            0       1         0          0
2    0    0     1     0    0    0                    0                    0  74            0       1         0          0
3    0    0     1     0    0    0                    0                    0  39            0       1         0          0
4    0    0     1     0    0    0                    0                    0  57            0       1         0          0
5    0    0     1     0    0    0                    0                    0  32            0       1         0          1
6    0    0     1     0    0    0                    0                    0  32            0       1         0          1
#fit model
fit <- lmFit(affydat, design)
  
# define contrast matrix
cont.matrix <- makeContrasts(P1="fCP_1-fCP_2",P2 = "fN_1-fN_2", P3 = "fC_1-fC_2",  levels=design)
> cont.matrix
                      Contrasts
Levels                 P1 P2 P3
  fC_1                  0  0  1
  fC_2                  0  0 -1
  fCP_1                 1  0  0
  fCP_2                -1  0  0
  fN_1                  0  1  0
  fN_2                  0 -1  0
  Sampling_locationS_2  0  0  0
  Sampling_locationS_3  0  0  0
  Age                   0  0  0
  Sitebuttocks          0  0  0
  Siteleg               0  0  0
  Sitethigh             0  0  0
  GenderMale            0  0  0
  fit2  <- contrasts.fit(fit, cont.matrix)
  fit3  <- eBayes(fit2, trend = TRUE)
  
  # the results for CP_1-CP_2 is in
  res <- topTable(fit3,coef=1,n = Inf, adjust="fdr", p = 0.05)
My results appear to be consistent with published studies however, I would like to confirm that my design is correct, and that I am appropriately controlling for the confounding variables before taking this any further.
Cheers,
Dave

The design and correction looks fine to me except I don't think eBayes(fit2, trend = TRUE ) is necessary? Why did you add trend=TRUE? Perhaps Gordon or some one similar can advise here also.
The
trend=TRUEargument will model any mean-variance trend in the data. This is required in cases where you have a mean-variance trend (obviously), otherwise your estimates of the prior d.f. and subsequent EB shrinkage would be confounded by systematic mean-dependent differences from a common prior variance. You can check whether you have a mean-dependent trend withplotSA- read the limma user's guide for more details.Hello,
I'm trying to do a differential methylation analysis with the Limma package and was wondering if you had any suggestions on how to handle this trend argument with this type of data? By applying plotSA empirically on my data, I don't have any particular trend on my variance but I don't know if this type of analysis is adapted to methylation data? In particular, when I try trend = TRUE or not in my eBayes function I get quite a different number of differentially methylated CpGs without really understanding the subtlety of what this represents from a biological point of view?
Thanks in advance for any suggestion,
Hortense