Question

Limma, blocking batch effect

2

Entering edit mode

Jeannette ▴ 20

@jeannette-8299

Last seen 10.6 years ago

I want to remove a known batch effect with limma using blocking.

My experiment has 36 samples from 3 donors with 12 different treatments. The batch effect is based on the different donors. I would like to remove batch effect and later compare different treatments (eg. treatment D vs. C).

This can be illustrated as follows:

Samples   Batch   Treatment
1   1   A
2   1   B
3   1   C
4   1   D
5   1   E
6   1   F
7   1   G
8   1   H
9   1   I
10   1   J
11   1   K
12   1   L
13   2   A
14   2   B
15   2   C
16   2   D
17   2   E
18   2   F
19   2   G
20   2   H
21   2   I
22   2   J
23   2   K
24   2   L
25   3   A
26   3   B
27   3   C
28   3   D
29   3   E
30   3   F
31   3   G
32   3   H
33   3   I
34   3   J
35   3   K
36   3   L

How to make a design matrix including the blocking factor?

limma batch effect • 6.9k views

ADD COMMENT • link updated 10.6 years ago by svlachavas ▴ 840 • written 10.6 years ago by Jeannette ▴ 20

score 0 · Answer 1 · 2015-06-30

Dear Jeannette,

you can first check another similar thread i have posted (C: Questions in limma about multifactorial design and possible batch effect correct) regarding a similar problem that i have experienced. However, i would like to ask you if you have performed first any PCA plots or use the function plotMDS() to see how big is the "known batch effect you mention". In other words, because if your samples dont separate well and gather together-that is your patterns in your principal components directly correlate with the batch variable, you should consider performing some batch effect correction using the package sva and combat to "correct" your eset, prior analysis . Nevertheless, as im not an expert on this methodology and other experienced people will answer and provide valuable information(in my case i didnt have so obvious effects based on my plots), so i followed the advice of Mr MacDonald and added the batch in my design matrix. So, you could use:

design <- model.matrix(~Treatment + Batch) # considering that both Treatment and Batch should be factor variables

and proceed with lmFit and eBayes. Also, you might want to give attention to your first level of the Treatment, in case you have a control group without any treatment

Best,

Efstathios

score 0 · Answer 2 · 2015-07-01

Dear Efstathios and dear Peter,

thank you very much for your answers. Please have a look below and perhaps answer again my new (beginner) questions:

AEset =ReadAffy()
rAEset=rma(AEset)
targets <- readTargets(targets.txt)

ArrayName   SampleNumber   Batch   Treatment
Hu4-1_(HG-U133_Plus_2)_3   1   1   A
Hu4-2_(HG-U133_Plus_2)_2   2   1   B
Hu4-3_(HG-U133_Plus_2)_2   3   1   C
Hu4-4_(HG-U133_Plus_2)_2   4   1   D
Hu4-5_(HG-U133_Plus_2)_2   5   1   E
Hu4-6_(HG-U133_Plus_2)_2   6   1   F
Hu4-7_(HG-U133_Plus_2)_2   7   1   G
Hu4-8_(HG-U133_Plus_2)_2   8   1   H
Hu4-9_(HG-U133_Plus_2)_2   9   1   I
Hu4-10_(HG-U133_Plus_2)_2   10   1   J
Hu4-11_(HG-U133_Plus_2)_2   11   1   K
Hu4-12_(HG-U133_Plus_2)_3   12   1   L
Hu5-1_(HG-U133_Plus_2)   13   2   A
Hu5-2_(HG-U133_Plus_2)   14   2   B
Hu5-3_(HG-U133_Plus_2)   15   2   C
Hu5-4_(HG-U133_Plus_2)   16   2   D
Hu5-5_(HG-U133_Plus_2)   17   2   E
Hu5-6_(HG-U133_Plus_2)   18   2   F
Hu5-7_(HG-U133_Plus_2)   19   2   G
Hu5-8_(HG-U133_Plus_2)   20   2   H
Hu5-9_(HG-U133_Plus_2)   21   2   I
Hu5-10_(HG-U133_Plus_2)   22   2   J
Hu5-11_(HG-U133_Plus_2)   23   2   K
Hu5-12_(HG-U133_Plus_2)   24   2   L
Hu6-1_(HG-U133_Plus_2)   25   3   A
Hu6-2_(HG-U133_Plus_2)   26   3   B
Hu6-3_(HG-U133_Plus_2)   27   3   C
Hu6-4_(HG-U133_Plus_2)   28   3   D
Hu6-5_(HG-U133_Plus_2)   29   3   E
Hu6-6_(HG-U133_Plus_2)   30   3   F
Hu6-7_(HG-U133_Plus_2)   31   3   G
Hu6-8_(HG-U133_Plus_2)   32   3   H
Hu6-9_(HG-U133_Plus_2)   33   3   I
Hu6-10_(HG-U133_Plus_2)   34   3   J
Hu6-11_(HG-U133_Plus_2)   35   3   K
Hu6-12_(HG-U133_Plus_2)   36   3   L

Treat<-factor(targets$Treatment)
Batch<-factor(targets$Batch)

design <- model.matrix(~factor(Treat) + factor(Batch))

The design matrix doesn't look like expected (for me): I miss Treat A and Batch 1? Could you tell me why they are missing?

(Intercept)   factor(Treat)B   factor(Treat)C   factor(Treat)D   factor(Treat)E   factor(Treat)F   factor(Treat)G   factor(Treat)H   factor(Treat)I   factor(Treat)J   factor(Treat)K   factor(Treat)L   factor(Batch)2   factor(Batch)3
1   0   0   0   0   0   0   0   0   0   0   0   0   0
1   1   0   0   0   0   0   0   0   0   0   0   0   0
1   0   1   0   0   0   0   0   0   0   0   0   0   0
1   0   0   1   0   0   0   0   0   0   0   0   0   0
1   0   0   0   1   0   0   0   0   0   0   0   0   0
1   0   0   0   0   1   0   0   0   0   0   0   0   0
1   0   0   0   0   0   1   0   0   0   0   0   0   0
1   0   0   0   0   0   0   1   0   0   0   0   0   0
1   0   0   0   0   0   0   0   1   0   0   0   0   0
1   0   0   0   0   0   0   0   0   1   0   0   0   0
1   0   0   0   0   0   0   0   0   0   1   0   0   0
1   0   0   0   0   0   0   0   0   0   0   1   0   0
1   0   0   0   0   0   0   0   0   0   0   0   1   0
1   1   0   0   0   0   0   0   0   0   0   0   1   0
1   0   1   0   0   0   0   0   0   0   0   0   1   0
1   0   0   1   0   0   0   0   0   0   0   0   1   0
1   0   0   0   1   0   0   0   0   0   0   0   1   0
1   0   0   0   0   1   0   0   0   0   0   0   1   0
1   0   0   0   0   0   1   0   0   0   0   0   1   0
1   0   0   0   0   0   0   1   0   0   0   0   1   0
1   0   0   0   0   0   0   0   1   0   0   0   1   0
1   0   0   0   0   0   0   0   0   1   0   0   1   0
1   0   0   0   0   0   0   0   0   0   1   0   1   0
1   0   0   0   0   0   0   0   0   0   0   1   1   0
1   0   0   0   0   0   0   0   0   0   0   0   0   1
1   1   0   0   0   0   0   0   0   0   0   0   0   1
1   0   1   0   0   0   0   0   0   0   0   0   0   1
1   0   0   1   0   0   0   0   0   0   0   0   0   1
1   0   0   0   1   0   0   0   0   0   0   0   0   1
1   0   0   0   0   1   0   0   0   0   0   0   0   1
1   0   0   0   0   0   1   0   0   0   0   0   0   1
1   0   0   0   0   0   0   1   0   0   0   0   0   1
1   0   0   0   0   0   0   0   1   0   0   0   0   1
1   0   0   0   0   0   0   0   0   1   0   0   0   1
1   0   0   0   0   0   0   0   0   0   1   0   0   1
1   0   0   0   0   0   0   0   0   0   0   1   0   1

fit<- lmFit(rAEset, design)
fit<-eBayes(fit)
topTable(fit)

X.Intercept. factor.Treat.B factor.Treat.C
213477_x_at         14.32670   0.0066422277   -0.007600883
201429_s_at         14.20578   0.0349340611   -0.002385242
208834_x_at         14.10887 -0.0001973202   -0.041636218
213583_x_at         14.36973   0.0514584539    0.013826467
216231_s_at         14.24558   0.0390321807   -0.044076872
226131_s_at         14.11218   0.0561841384   -0.020003040
213084_x_at         14.10507   0.0041735892   -0.048600165
213614_x_at         14.14842 -0.0153255164   -0.024215554
AFFX-hum_alu_at     14.66200 -0.0074594425    0.077741362
212869_x_at         14.42581 -0.0160917656   -0.064885389
                factor.Treat.D factor.Treat.E factor.Treat.F
213477_x_at       -0.001377460   -0.050702074    0.051865528
201429_s_at        0.025709220   -0.010092836   -0.013081372
208834_x_at        0.007341653    0.045773605    0.013431660
213583_x_at        0.016923252   -0.031997855    0.057666384
216231_s_at       -0.050963361   -0.011837809    0.035742225
226131_s_at        0.029673356   -0.042260419   -0.001585295
213084_x_at        0.013819580    0.009045218   -0.002321539
213614_x_at       -0.015048376   -0.039968926    0.019965938
AFFX-hum_alu_at    0.053618376    0.002911006    0.077713513
212869_x_at       -0.033341353   -0.023474926   -0.002161103
                factor.Treat.G factor.Treat.H factor.Treat.I
213477_x_at       0.0257321453   0.0768203223     0.03021889
201429_s_at      -0.0006291994   0.0491792622    -0.04604729
208834_x_at       0.0351613242 -0.0176276116    -0.02937423
213583_x_at       0.0680759699   0.0962858847     0.08730183
216231_s_at       0.0058945653 -0.0597448531     0.01669841
226131_s_at       0.0481740774   0.0466630885    -0.01247180
213084_x_at       0.0311798064 -0.0006383324    -0.05125879
213614_x_at       0.0371051870   0.0621918189     0.02366662
AFFX-hum_alu_at   0.0011396547   0.0871858929     0.09836153
212869_x_at       0.0091505640   0.0125854227    -0.01723962
                factor.Treat.J factor.Treat.K factor.Treat.L
213477_x_at        0.051355404   -0.028656052    0.018755885
201429_s_at       -0.029621140   -0.019563313   -0.005783606
208834_x_at       -0.059944204    0.003526064    0.023858054
213583_x_at        0.082476431   -0.009219475    0.053840761
216231_s_at        0.042499819   -0.003135958    0.020467677
226131_s_at       -0.019817628   -0.031505195    0.037207605
213084_x_at       -0.087220127   -0.014959050    0.010348036
213614_x_at        0.032127800   -0.032768684    0.026709697
AFFX-hum_alu_at    0.094535560    0.106192684    0.088915943
212869_x_at       -0.009439653   -0.047039333    0.011175808
                factor.Batch.2 factor.Batch.3 AveExpr        F
213477_x_at       -0.018194830     0.03891480 14.34803 99140.40
201429_s_at        0.046022000     0.04233994 14.23379 91804.48
208834_x_at        0.050637472     0.06257520 14.14497 91262.65
213583_x_at       -0.006076321     0.04993156 14.42491 89788.40
216231_s_at        0.027905592     0.19533235 14.31921 86928.12
226131_s_at        0.049201400     0.07235871 14.16023 86768.34
213084_x_at        0.057847385     0.08806016 14.14233 84925.36
213614_x_at       -0.005764947     0.09213428 14.18341 83335.92
AFFX-hum_alu_at   -0.058736541    -0.13878791 14.65290 83109.05
212869_x_at        0.022519146     0.02076862 14.42518 82826.61
                     P.Value    adj.P.Val
213477_x_at     3.959310e-57 1.933176e-52
201429_s_at     1.063226e-56 1.933176e-52
208834_x_at     1.147253e-56 1.933176e-52
213583_x_at     1.414304e-56 1.933176e-52
216231_s_at     2.143908e-56 1.976043e-52
226131_s_at     2.195194e-56 1.976043e-52
213084_x_at     2.892552e-56 1.976043e-52
213614_x_at     3.687342e-56 1.976043e-52
AFFX-hum_alu_at 3.818794e-56 1.976043e-52
212869_x_at     3.989544e-56 1.976043e-52

As I understood from the limma user guide, the treatments are now adjusted for the differences between the batches?

So further on I can make comparisons e.g. D vs. C?

Thanks in advance!

Jeannette

score 0 · Answer 3 · 2015-07-01

Dear Jeannette,

as i have mentioned above you maybe want to have a look or change the first level of your factor Treatment as your baseline level-if for instance you have a control group. If you dont have a control group in your design, then when you use your above code, the first level of your factor as far as i can see from the output of the model.matrix is the first level of the factor Treatment, that is Treatment A. So, the Intercept is this case i think represents the average expression of the Treatment A samples-while the other coefficients represent the log-fold change of the expression of the other Treatments vs the Treatment A, after taking into account(controlling) for the possible batch effect which is represented by the batch variable. Im not sure about the batch 1 you are mentioning, but i also believe that has been absorved in the Intercept as well. Maybe other experienced people who will see the post have a better explanation of the batch variable.Anyway, in my opinion you should not worry for the batch coefficient as have been included in your design, but more about how you will construct your differences

(* By the way, i also modified my code for a current study i been using 3 levels for my batch variable and i got the same for the batch:)

targets$replicate <- factor(rep(c(1:3), each=1, times=4))
> batch <- targets$replicate
> design <- model.matrix(~samples + batch)
> design
(Intercept) samplesT47D_OHT samplesT47D_GDC samplesT47D_C10 samplesT47D_GDC_OHT
1 1 0 0 0 0
2 1 0 0 0 0
3 1 1 0 0 0
4 1 1 0 0 0
5 1 0 1 0 0
6 1 0 1 0 0
7 1 0 0 1 0
8 1 0 0 1 0
9 1 0 0 0 1
10 1 0 0 0 1
11 1 0 0 0 0
12 1 0 0 0 0
samplesT47D_C10_OHT batch2 batch3
1 0 0 0
2 0 1 0
3 0 0 1
4 0 0 0
5 0 1 0
6 0 0 1
7 0 0 0
8 0 1 0
9 0 0 1
10 0 0 0
11 1 1 0
12 1 0 1