Dear all,
I'm performing a differential gene expression analysis using limma.
My study design is simple in that I only want to compare control and disease within three different groups, however, I want to control for 3 categorical covariates - site, sampling location and gender, and one continuous covariate - age. The contrasts of interest are:
CP_1 to CP_2: DEGs between control CP (CP_1) and diseased CP (CP_2)
C_1 to C_2: DEGs between control C (C_1) and diseased C (C_2)
N_1 to N_2: DEGs between control N (N_1) and diseased N (N_2)
Thus, '_1' represents a control, and '_2' represents a diseased case. I am not interested in genes associated with gender/age or any of the covariates, I just want to identify the disease effect whilst controlling for the covariates.
My targets file is follows:
> targets[1:10,] Sample_ID Group Site Sampling_location Gender Age 1 emiasb1 CP_1 leg S_2 Female 54 2 emiasb2 CP_1 leg S_1 Female 74 3 emiasb3 CP_1 leg S_1 Female 39 4 emiasb4 CP_1 leg S_1 Female 57 5 emiasb5 CP_1 leg S_1 Male 32 6 emiasb6 CP_1 leg S_1 Male 32 7 emiasb7 CP_1 buttocks S_3 Male 63 8 emiasb8 CP_1 leg S_2 Male 46 9 emiasb9 CP_1 leg S_3 Male 72 10 emiasb10 CP_1 leg S_2 Male 58 # sample numbers > table(targets$Group) C_1 C_2 CP_1 CP_2 N_1 N_2 136 57 87 27 87 30 f <- factor(targets$Group) # cell means model with sampling location, age, site, and gender as blocking factors design <- model.matrix(~0 + f + Sampling_location + as.numeric(Age) + Site + Gender, targets) head(design) > head(design) fC_1 fC_2 fCP_1 fCP_2 fN_1 fN_2 Sampling_locationS_2 Sampling_locationS_3 Age Sitebuttocks Siteleg Sitethigh GenderMale 1 0 0 1 0 0 0 1 0 54 0 1 0 0 2 0 0 1 0 0 0 0 0 74 0 1 0 0 3 0 0 1 0 0 0 0 0 39 0 1 0 0 4 0 0 1 0 0 0 0 0 57 0 1 0 0 5 0 0 1 0 0 0 0 0 32 0 1 0 1 6 0 0 1 0 0 0 0 0 32 0 1 0 1 #fit model fit <- lmFit(affydat, design) # define contrast matrix cont.matrix <- makeContrasts(P1="fCP_1-fCP_2",P2 = "fN_1-fN_2", P3 = "fC_1-fC_2", levels=design) > cont.matrix Contrasts Levels P1 P2 P3 fC_1 0 0 1 fC_2 0 0 -1 fCP_1 1 0 0 fCP_2 -1 0 0 fN_1 0 1 0 fN_2 0 -1 0 Sampling_locationS_2 0 0 0 Sampling_locationS_3 0 0 0 Age 0 0 0 Sitebuttocks 0 0 0 Siteleg 0 0 0 Sitethigh 0 0 0 GenderMale 0 0 0 fit2 <- contrasts.fit(fit, cont.matrix) fit3 <- eBayes(fit2, trend = TRUE) # the results for CP_1-CP_2 is in res <- topTable(fit3,coef=1,n = Inf, adjust="fdr", p = 0.05)
My results appear to be consistent with published studies however, I would like to confirm that my design is correct, and that I am appropriately controlling for the confounding variables before taking this any further.
Cheers,
Dave
The design and correction looks fine to me except I don't think eBayes(fit2, trend = TRUE ) is necessary? Why did you add trend=TRUE? Perhaps Gordon or some one similar can advise here also.
The
trend=TRUE
argument will model any mean-variance trend in the data. This is required in cases where you have a mean-variance trend (obviously), otherwise your estimates of the prior d.f. and subsequent EB shrinkage would be confounded by systematic mean-dependent differences from a common prior variance. You can check whether you have a mean-dependent trend withplotSA
- read the limma user's guide for more details.