Hello,
I am struggling with the analysis of an unbalanced intervention study, and would appreciate any advice on this.
I got a dataset from a coworker in which he analyzed gene expression in tissue biopsies from healthy (N=14) and obese (N=29) subjects at baseline (B), and after 2 interventions that were performed in the obese subjects only [Treat1 and Treat2]. In total my dataset consists of 69 arrays, comprised of 14 healthy baseline arrays, and for the obese subjects 13 complete array pairs (baseline vs after) for Treat1, 13 complete pairs for Treat2, and 3 incomplete pairs (2 arrays for obese subjects baseline only, and 1 array for Treat2 only).
Of interest are:
-
genes differentially expressed at baseline between all 14 healthy samples vs all 28 obese baseline samples.
-
genes differentially expressed in obese subjects after Treatment1 (after vs before in subset of 13 ppl (=26 arrays) + 2 unpaired baseline arrays.
-
genes differentially expressed in obese subjects after Treatment2 (after vs before in subset of 13 ppl +26 arrays) + 1 unpaired Treat2 array + 2 unpaired baseline arrays)
-
genes that respond different between the 2 treatments (“delta” of the 2 “delta’s”, i.e. comparison 2 vs 3).
In this experiment I thus would like to make comparisons within and between subjects.
AFAIK i do have two options:
-
analyze using a paired design, i.e. block on SubjectID
-
analyze according the multilevel example in the limma user guide (thus using the function duplicateCorrelation).
However, what would be the best approach in mycase?
Advantage of A is that paired (sub-)structure of the design is optimally used, disadvantage is that design is not of full rank and that some coefficients are thus not estimable. The latter is not really an issue because it are two coefs for two subjects?
Advantage of B is that design is of full rank, but the paired substructure is not optimally used, and hence is less powerful?
Any insights are appreciated!
Thanks,
Guido
> targets
Filename Status Time SubjectID
1 DELTA_G131_A01_1_PP002_T0.CEL lean base s2
2 DELTA_G131_A03_3_PP003_T0.CEL lean base s3
3 DELTA_G131_A05_5_PP004_T0.CEL lean base s4
4 DELTA_G131_A07_7_PP005_T0_2.CEL lean base s5
5 DELTA_G131_A09_9_PP006_T0.CEL lean base s6
6 DELTA_G131_A11_11_PP007_T0.CEL lean base s7
7 DELTA_G131_B01_15_PP009_T0.CEL lean base s9
8 DELTA_G131_B03_17_PP010_T0.CEL lean base s10
9 DELTA_G131_B05_23_PP013_T0.CEL lean base s13
10 DELTA_G131_B07_27_PP015_T0.CEL lean base s15
11 DELTA_G131_B09_29_PP101_BL_T0.CEL obese baseT2 s101
12 DELTA_G131_B11_31_PP101_FU_T0.CEL obese T2 s101
13 DELTA_G131_C01_33_PP102_BL_T0.CEL obese baseT1 s102
14 DELTA_G131_C03_35_PP102_FU_T0.CEL obese T1 s102
15 DELTA_G131_C05_41_PP104_BL_T0.CEL obese baseT1 s104
16 DELTA_G131_C07_43_PP104_FU_T0.CEL obese T1 s104
17 DELTA_G131_C09_45_PP105_BL_T0.CEL obese baseT2 s105
18 DELTA_G131_C11_47_PP105_FU_T0.CEL obese T2 s105
19 DELTA_G131_D01_49_PP106_BL_T0.CEL obese baseT1 s106
20 DELTA_G131_D03_51_PP106_FU_T0.CEL obese T1 s106
21 DELTA_G131_D05_53_PP107_BL_T0.CEL obese baseT2 s107
22 DELTA_G131_D07_55_PP107_FU_T0.CEL obese T2 s107
23 DELTA_G131_D09_63_PP110_BL_T0.CEL obese baseT2 s110
24 DELTA_G131_D11_65_PP110_FU_T0.CEL obese T2 s110
25 DELTA_G131_E01_67_PP111_BL_T0.CEL obese baseT1 s111
26 DELTA_G131_E03_69_PP111_FU_T0.CEL obese T1 s111
27 DELTA_G131_E05_71_PP112_BL_T0.CEL obese baseT2 s112
28 DELTA_G131_E07_73_PP112_FU_T0.CEL obese T2 s112
29 DELTA_G131_E09_75_PP113_BL_T0.CEL obese baseT1 s113
30 DELTA_G131_E11_77_PP113_FU_T0.CEL obese T1 s113
31 DELTA_G131_F01_82_PP115_BL_T0.CEL obese baseT1 s115
32 DELTA_G131_F03_84_PP115_FU_T0.CEL obese T1 s115
33 DELTA_G131_F09_90_PP117_BL_T0.CEL obese baseT2 s117
34 DELTA_G131_F11_92_PP117_FU_T0.CEL obese T2 s117
35 DELTA_G131_G01_94_PP118_BL_T0.CEL obese baseT1 s118
36 DELTA_G131_G03_96_PP118_FU_T0.CEL obese T1 s118
37 DELTA_G131_G05_104_PP121_BL_T0.CEL obese baseT1 s121
38 DELTA_G131_G07_106_PP121_FU_T0.CEL obese T1 s121
39 DELTA_G131_G09_108_PP122_BL_T0.CEL obese baseT2 s122
40 DELTA_G131_G11_110_PP122_FU_T0.CEL obese T2 s122
41 DELTA_G131_H01_112_PP123_BL_T0.CEL obese baseT2 s123
42 DELTA_G131_H03_114_PP123_FU_T0.CEL obese T2 s123
43 DELTA_G131_H05_116_PP124_BL_T0.CEL obese baseT1 s124
44 DELTA_G131_H07_118_PP124_FU_T0.CEL obese T1 s124
45 DELTA_G131_H09_128_PP127_BL_T0.CEL obese baseT2 s127
46 DELTA_G131_H11_130_PP127_FU_T0.CEL obese T2 s127
47 DELTA_G132_A05_13_PP008.T0.CEL lean base s8
48 DELTA_G132_A09_19_PP011.T0.CEL lean base s11
49 DELTA_G132_B07_21_PP012.T0.CEL lean base s12
50 DELTA_G132_C05_25_PP014.T0.CEL lean base s14
51 DELTA_G132_C09_37_PP103.BL.T0.CEL obese baseT2 s103
52 DELTA_G132_D07_39_PP103.FU.T0_2.CEL obese T2 s103
53 DELTA_G132_E05_57_PP108.BL.T0.CEL obese baseT1 s108
54 DELTA_G132_E09_59_PP108.FU.T0.CEL obese T1 s108
55 DELTA_G132_F07_120_PP125.BL.T0.CEL obese baseT1 s125
56 DELTA_G132_G05_122_PP125.FU.T0.CEL obese T1 s125
57 DELTA_G132_G09_124_PP126.BL.T0.CEL obese baseT1 s126
58 DELTA_G132_H07_126_PP126.FU.T0.CEL obese T1 s126
59 DELTA_G133_A05_61_PP109.BL.T0.CEL obese baseT1 s109
60 DELTA_G133_A09_79_PP114.BL.T0.CEL obese T2 s114
61 DELTA_G133_B09_98_PP119.BL.T0.CEL obese baseT2 s119
62 DELTA_G133_C07_100_PP119.FU.T0.CEL obese T2 s119
63 DELTA_G133_D05_102_PP120.BL.T0.CEL obese baseT2 s120
64 DELTA_G133_D09_132_PP128.BL.T0.CEL obese baseT2 s128
65 DELTA_G133_E07_134_PP128.FU.T0.CEL obese T2 s128
66 DELTA_G133_F05_136_PP129.BL.T0.CEL obese baseT1 s129
67 DELTA_G133_F09_138_PP129.FU.T0.CEL obese T1 s129
68 DELTA_G133_G07_86_PP116.BL.T0.CEL obese baseT2 s116
69 DELTA_G133_H05_88_PP116.FU.T0.CEL obese T2 s116
> # in yellow the 3 incomplete pairs
> Status <- factor(targets$Status) > Time <- factor(targets$Time) > Subject <- factor(targets$SubjectID) > TS <- as.factor(paste(Status, Time, sep=".")) > TS [1] lean.base lean.base lean.base lean.base lean.base lean.base [7] lean.base lean.base lean.base lean.base obese.baseT2 obese.T2 [13] obese.baseT1 obese.T1 obese.baseT1 obese.T1 obese.baseT2 obese.T2 [19] obese.baseT1 obese.T1 obese.baseT2 obese.T2 obese.baseT2 obese.T2 [25] obese.baseT1 obese.T1 obese.baseT2 obese.T2 obese.baseT1 obese.T1 [31] obese.baseT1 obese.T1 obese.baseT2 obese.T2 obese.baseT1 obese.T1 [37] obese.baseT1 obese.T1 obese.baseT2 obese.T2 obese.baseT2 obese.T2 [43] obese.baseT1 obese.T1 obese.baseT2 obese.T2 lean.base lean.base [49] lean.base lean.base obese.baseT2 obese.T2 obese.baseT1 obese.T1 [55] obese.baseT1 obese.T1 obese.baseT1 obese.T1 obese.baseT1 obese.T2 [61] obese.baseT2 obese.T2 obese.baseT2 obese.baseT2 obese.T2 obese.baseT1 [67] obese.T1 obese.baseT2 obese.T2 Levels: lean.base obese.baseT1 obese.baseT2 obese.T1 obese.T2 >
Option A; include Subject in model, but design is not full rank...
> design <- model.matrix((~0+TS+Subject)) > is.fullrank(design) [1] FALSE > nonEstimable(design) [1] "Subjects128" "Subjects129" >
Option B; use duplicateCorrelation()
> design <- model.matrix(~0+TS) #thus not including the subjects > is.fullrank(design) [1] TRUE > nonEstimable(design) NULL > corfit <- duplicateCorrelation(x.norm,design,block=Subject)
Define contrasts
> cont.matrix <- makeContrasts(
+ ObeseBLvsLeanBL = (TSobese.baseT1 + TSobese.baseT2 )/2 - TSlean.base, #combine all obese baseline samples
+ EffectT1 = (TSobese.T1 - TSobese.baseT1),
+ EffectT2 = (TSobese.T2 - TSobese.baseT2),
+ deltaDelta = ( (TSobese.T2 - TSobese.baseT2) - (TSobese.T1 - TSobese.baseT1)),
+ levels=design)
>
> cont.matrix
Contrasts
Levels ObeseBLvsLeanBL EffectT1 EffectT2 deltaDelta
TSlean.base -1.0 0 0 0
TSobese.baseT1 0.5 -1 0 1
TSobese.baseT2 0.5 0 -1 -1
TSobese.T1 0.0 1 0 -1
TSobese.T2 0.0 0 1 1
>
Option A:
> fit <- lmFit(x.norm,design)
Coefficients not estimable: Subjects128 Subjects129
Warning message:
Partial NA coefficients for 19654 probe(s)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2, trend=TRUE)
Loading required package: splines
> topTable(fit2)
Error in topTableF(fit, number = number, genelist = genelist, adjust.method = adjust.method, :
F-statistics not found in fit
> topTable(fit2,coef=1)
logFC AveExpr t P.Value adj.P.Val B
388595_at -1.2992596 -0.045251715 -5.402161 9.033393e-06 0.1775423 2.24502824
93463_at 1.0838054 -0.033891183 4.762181 5.223624e-05 0.3696189 1.05278388
10589_at -0.7425038 0.002192174 -4.711510 5.950177e-05 0.3696189 0.96356396
5075_at -0.7834098 0.017699312 -4.424018 1.317521e-04 0.3696189 0.41041130
8394_at -0.5483462 -0.007113591 -4.396152 1.398632e-04 0.3696189 0.36951056
5111_at -0.6544413 0.044793358 -4.389222 1.425396e-04 0.3696189 0.35623703
150280_at -0.8668590 0.016500401 -4.368414 1.533131e-04 0.3696189 0.30437320
3850_at -0.7982704 0.000636201 -4.313728 1.779197e-04 0.3696189 0.20002697
29100_at 0.5633564 -0.028783088 4.260558 2.025360e-04 0.3696189 0.10966118
4286_at -0.7714238 0.035521729 -4.250192 2.114479e-04 0.3696189 0.07875228
>
Option B (note that the statistical signifcance for coef=1 is much 'better' than above):
> design <- model.matrix(~0+TS)
> fit <- lmFit(x.norm,design,block=Subject,correlation=corfit$consensus)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2, trend=TRUE)
> topTable(fit2)
ObeseBLvsLeanBL EffectT1 EffectT2 deltaDelta AveExpr
5768_at 0.27094472 0.102441091 -0.03123944 -0.1336805291 0.0131168620
100129455_at -0.26193499 0.368394971 0.10007335 -0.2683216244 -0.0002613564
11316_at 0.16576403 0.131316402 0.13082179 -0.0004946125 0.0003524714
2357_at 0.35247065 -0.003587640 -0.03853153 -0.0349438859 -0.0179288959
57533_at 0.12907861 0.041385510 -0.17499073 -0.2163762354 -0.0194017923
9645_at 0.28792718 0.004117408 -0.10544675 -0.1095641611 0.0226279000
55022_at 0.40774391 0.046476910 -0.09294465 -0.1394215643 0.0688917460
81553_at 0.25856635 0.010798014 -0.07423535 -0.0850333635 -0.0243488464
91544_at 0.17854136 0.078838131 0.01881281 -0.0600253259 0.0441462794
22891_at -0.06227555 -0.029604714 -0.20926173 -0.1796570190 -0.0234380020
F P.Value adj.P.Val
5768_at 12.948352 8.272455e-07 0.01625868
100129455_at 9.932042 1.574822e-05 0.08406391
11316_at 9.915626 1.601338e-05 0.08406391
2357_at 9.850647 1.710876e-05 0.08406391
57533_at 9.177903 3.417815e-05 0.10274447
9645_at 9.125528 3.608937e-05 0.10274447
55022_at 8.835055 4.887136e-05 0.10274447
81553_at 8.640098 5.998237e-05 0.10274447
91544_at 8.619696 6.128604e-05 0.10274447
22891_at 8.541327 6.657034e-05 0.10274447
> topTable(fit2,coef=1)
logFC AveExpr t P.Value adj.P.Val B
5768_at 0.2709447 0.01311686 5.176671 2.102617e-06 0.02330378 4.581468
2357_at 0.3524707 -0.01792890 5.144169 2.383376e-06 0.02330378 4.471316
9645_at 0.2879272 0.02262790 5.017521 3.872730e-06 0.02330378 4.044732
81553_at 0.2585663 -0.02434885 4.845884 7.418486e-06 0.02330378 3.473728
1848_at 0.4078608 0.03655012 4.831124 7.841526e-06 0.02330378 3.425026
54472_at 0.1947691 0.06115485 4.796373 8.932633e-06 0.02330378 3.310631
1462_at 0.2264794 -0.01457601 4.785774 9.293975e-06 0.02330378 3.275814
55022_at 0.4077439 0.06889175 4.780315 9.485613e-06 0.02330378 3.257895
4688_at 0.2567551 0.01126808 4.714461 1.212391e-05 0.02647592 3.042476
51099_at 0.3188907 -0.05336680 4.553293 2.196215e-05 0.04159022 2.521259
>

Thanks Aaron for your insights, much appreciated.
To be clear, do you suggest to split the analysis in 2 parts; i.e. one analysis for aim 1 using duplicateCorrelation, and a second analysis blocking by SubjectID for aim 2-4? And in both cases use the complete data set (all arrays)?
That's right. Of course, even if you do use the complete data set, the unpaired obese samples won't be useful for the
SubjectID-blocked analysis and will be automatically ignored.