Dear Efstathios and dear Peter,
thank you very much for your answers. Please have a look below and perhaps answer again my new (beginner) questions:
AEset =ReadAffy()
rAEset=rma(AEset)
targets <- readTargets(targets.txt)
ArrayName SampleNumber Batch Treatment
Hu4-1_(HG-U133_Plus_2)_3 1 1 A
Hu4-2_(HG-U133_Plus_2)_2 2 1 B
Hu4-3_(HG-U133_Plus_2)_2 3 1 C
Hu4-4_(HG-U133_Plus_2)_2 4 1 D
Hu4-5_(HG-U133_Plus_2)_2 5 1 E
Hu4-6_(HG-U133_Plus_2)_2 6 1 F
Hu4-7_(HG-U133_Plus_2)_2 7 1 G
Hu4-8_(HG-U133_Plus_2)_2 8 1 H
Hu4-9_(HG-U133_Plus_2)_2 9 1 I
Hu4-10_(HG-U133_Plus_2)_2 10 1 J
Hu4-11_(HG-U133_Plus_2)_2 11 1 K
Hu4-12_(HG-U133_Plus_2)_3 12 1 L
Hu5-1_(HG-U133_Plus_2) 13 2 A
Hu5-2_(HG-U133_Plus_2) 14 2 B
Hu5-3_(HG-U133_Plus_2) 15 2 C
Hu5-4_(HG-U133_Plus_2) 16 2 D
Hu5-5_(HG-U133_Plus_2) 17 2 E
Hu5-6_(HG-U133_Plus_2) 18 2 F
Hu5-7_(HG-U133_Plus_2) 19 2 G
Hu5-8_(HG-U133_Plus_2) 20 2 H
Hu5-9_(HG-U133_Plus_2) 21 2 I
Hu5-10_(HG-U133_Plus_2) 22 2 J
Hu5-11_(HG-U133_Plus_2) 23 2 K
Hu5-12_(HG-U133_Plus_2) 24 2 L
Hu6-1_(HG-U133_Plus_2) 25 3 A
Hu6-2_(HG-U133_Plus_2) 26 3 B
Hu6-3_(HG-U133_Plus_2) 27 3 C
Hu6-4_(HG-U133_Plus_2) 28 3 D
Hu6-5_(HG-U133_Plus_2) 29 3 E
Hu6-6_(HG-U133_Plus_2) 30 3 F
Hu6-7_(HG-U133_Plus_2) 31 3 G
Hu6-8_(HG-U133_Plus_2) 32 3 H
Hu6-9_(HG-U133_Plus_2) 33 3 I
Hu6-10_(HG-U133_Plus_2) 34 3 J
Hu6-11_(HG-U133_Plus_2) 35 3 K
Hu6-12_(HG-U133_Plus_2) 36 3 L
Treat<-factor(targets$Treatment)
Batch<-factor(targets$Batch)
design <- model.matrix(~factor(Treat) + factor(Batch))
The design matrix doesn't look like expected (for me): I miss Treat A and Batch 1? Could you tell me why they are missing?
(Intercept) factor(Treat)B factor(Treat)C factor(Treat)D factor(Treat)E factor(Treat)F factor(Treat)G factor(Treat)H factor(Treat)I factor(Treat)J factor(Treat)K factor(Treat)L factor(Batch)2 factor(Batch)3
1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0 0 1 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 0 0 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0
1 0 0 0 0 0 0 0 0 0 0 0 1 0
1 1 0 0 0 0 0 0 0 0 0 0 1 0
1 0 1 0 0 0 0 0 0 0 0 0 1 0
1 0 0 1 0 0 0 0 0 0 0 0 1 0
1 0 0 0 1 0 0 0 0 0 0 0 1 0
1 0 0 0 0 1 0 0 0 0 0 0 1 0
1 0 0 0 0 0 1 0 0 0 0 0 1 0
1 0 0 0 0 0 0 1 0 0 0 0 1 0
1 0 0 0 0 0 0 0 1 0 0 0 1 0
1 0 0 0 0 0 0 0 0 1 0 0 1 0
1 0 0 0 0 0 0 0 0 0 1 0 1 0
1 0 0 0 0 0 0 0 0 0 0 1 1 0
1 0 0 0 0 0 0 0 0 0 0 0 0 1
1 1 0 0 0 0 0 0 0 0 0 0 0 1
1 0 1 0 0 0 0 0 0 0 0 0 0 1
1 0 0 1 0 0 0 0 0 0 0 0 0 1
1 0 0 0 1 0 0 0 0 0 0 0 0 1
1 0 0 0 0 1 0 0 0 0 0 0 0 1
1 0 0 0 0 0 1 0 0 0 0 0 0 1
1 0 0 0 0 0 0 1 0 0 0 0 0 1
1 0 0 0 0 0 0 0 1 0 0 0 0 1
1 0 0 0 0 0 0 0 0 1 0 0 0 1
1 0 0 0 0 0 0 0 0 0 1 0 0 1
1 0 0 0 0 0 0 0 0 0 0 1 0 1
fit<- lmFit(rAEset, design)
fit<-eBayes(fit)
topTable(fit)
X.Intercept. factor.Treat.B factor.Treat.C
213477_x_at 14.32670 0.0066422277 -0.007600883
201429_s_at 14.20578 0.0349340611 -0.002385242
208834_x_at 14.10887 -0.0001973202 -0.041636218
213583_x_at 14.36973 0.0514584539 0.013826467
216231_s_at 14.24558 0.0390321807 -0.044076872
226131_s_at 14.11218 0.0561841384 -0.020003040
213084_x_at 14.10507 0.0041735892 -0.048600165
213614_x_at 14.14842 -0.0153255164 -0.024215554
AFFX-hum_alu_at 14.66200 -0.0074594425 0.077741362
212869_x_at 14.42581 -0.0160917656 -0.064885389
factor.Treat.D factor.Treat.E factor.Treat.F
213477_x_at -0.001377460 -0.050702074 0.051865528
201429_s_at 0.025709220 -0.010092836 -0.013081372
208834_x_at 0.007341653 0.045773605 0.013431660
213583_x_at 0.016923252 -0.031997855 0.057666384
216231_s_at -0.050963361 -0.011837809 0.035742225
226131_s_at 0.029673356 -0.042260419 -0.001585295
213084_x_at 0.013819580 0.009045218 -0.002321539
213614_x_at -0.015048376 -0.039968926 0.019965938
AFFX-hum_alu_at 0.053618376 0.002911006 0.077713513
212869_x_at -0.033341353 -0.023474926 -0.002161103
factor.Treat.G factor.Treat.H factor.Treat.I
213477_x_at 0.0257321453 0.0768203223 0.03021889
201429_s_at -0.0006291994 0.0491792622 -0.04604729
208834_x_at 0.0351613242 -0.0176276116 -0.02937423
213583_x_at 0.0680759699 0.0962858847 0.08730183
216231_s_at 0.0058945653 -0.0597448531 0.01669841
226131_s_at 0.0481740774 0.0466630885 -0.01247180
213084_x_at 0.0311798064 -0.0006383324 -0.05125879
213614_x_at 0.0371051870 0.0621918189 0.02366662
AFFX-hum_alu_at 0.0011396547 0.0871858929 0.09836153
212869_x_at 0.0091505640 0.0125854227 -0.01723962
factor.Treat.J factor.Treat.K factor.Treat.L
213477_x_at 0.051355404 -0.028656052 0.018755885
201429_s_at -0.029621140 -0.019563313 -0.005783606
208834_x_at -0.059944204 0.003526064 0.023858054
213583_x_at 0.082476431 -0.009219475 0.053840761
216231_s_at 0.042499819 -0.003135958 0.020467677
226131_s_at -0.019817628 -0.031505195 0.037207605
213084_x_at -0.087220127 -0.014959050 0.010348036
213614_x_at 0.032127800 -0.032768684 0.026709697
AFFX-hum_alu_at 0.094535560 0.106192684 0.088915943
212869_x_at -0.009439653 -0.047039333 0.011175808
factor.Batch.2 factor.Batch.3 AveExpr F
213477_x_at -0.018194830 0.03891480 14.34803 99140.40
201429_s_at 0.046022000 0.04233994 14.23379 91804.48
208834_x_at 0.050637472 0.06257520 14.14497 91262.65
213583_x_at -0.006076321 0.04993156 14.42491 89788.40
216231_s_at 0.027905592 0.19533235 14.31921 86928.12
226131_s_at 0.049201400 0.07235871 14.16023 86768.34
213084_x_at 0.057847385 0.08806016 14.14233 84925.36
213614_x_at -0.005764947 0.09213428 14.18341 83335.92
AFFX-hum_alu_at -0.058736541 -0.13878791 14.65290 83109.05
212869_x_at 0.022519146 0.02076862 14.42518 82826.61
P.Value adj.P.Val
213477_x_at 3.959310e-57 1.933176e-52
201429_s_at 1.063226e-56 1.933176e-52
208834_x_at 1.147253e-56 1.933176e-52
213583_x_at 1.414304e-56 1.933176e-52
216231_s_at 2.143908e-56 1.976043e-52
226131_s_at 2.195194e-56 1.976043e-52
213084_x_at 2.892552e-56 1.976043e-52
213614_x_at 3.687342e-56 1.976043e-52
AFFX-hum_alu_at 3.818794e-56 1.976043e-52
212869_x_at 3.989544e-56 1.976043e-52
As I understood from the limma user guide, the treatments are now adjusted for the differences between the batches?
So further on I can make comparisons e.g. D vs. C?
Thanks in advance!
Jeannette
Just a quick comment - ComBat should NOT be used before running association testing (lmFit); association testing should be run with batch as a covariate on original data. The design Efstathios wrote is almost correct, except that you should add factor():
design <- model.matrix(~factor(Treatment) + factor(Batch))
ComBat is useful if the data is to be analyzed using methods that cannot deal with covariates but if covariates can be included such as in linear models, they should be used.
Dear Peter,
thank you for your evaluation-regarding the factor indication, i mention it with the # symbol next to the model.matrix-i didnt include it in case any of the variables needs releveling for any specific purpose