Question: Questions in limma about multifactorial design and possible batch effect correction
gravatar for svlachavas
2.7 years ago by
Greece/Athens/National Hellenic Research Foundation
svlachavas570 wrote:


Dear Bioconductor Community,

i have tried to use linear modeling in the below expression set for the variable "condition", which describes 5 levels of one control and 4 biological substances. in order to evaluate and test the possible changes in gene expression on a specific cell line between the "control" and the 4 substances. My code is below:


                           condition        replicate
dataset_603.dat      Control1             1
dataset_604.dat      Control1             2
dataset_605.dat      Biological1         1
dataset_606.dat      Biological1         2
dataset_607.dat      Biological2         1
dataset_608.dat      Biological2         2
dataset_609.dat      Biological3         1
dataset_610.dat      Biological3         2
dataset_611.dat      Biological4         1
dataset_612.dat      Biological4         2

f <- factor(normalized2$condition, levels=c("Control1","Biological1","Biological2", "Biological3", "Biological4"))

design <- model.matrix(~0+f)

fit <- lmFit(normalized2, design)

contrast.matrix <- makeContrasts(fBiological1-fControl1, fBiological2-fControl1, fBiological3-fControl1, fBiological4-fControl1, levels=design)

fit2 <-, contrast.matrix)

fit2 <- eBayes(fit2)

Unfortunately, as anyone can see from the pData object, my dataset consists of two different batches: that is, each substance and the control is comprized of two biological replicates-and each batch is compized of 5 CEL files(which are characterized by number 1 and number 2 respectively(i.e  those CEL files samples that have replicate=2  have been preprossessed also together in a different time). Thus, my main consern is if and how could i use from limma the information of the batch that each CEL belongs(normalized2$replicate) in order to include it as a random effect in limma ? Or my approach is completely wrong and i should perform some batch effect correction ? and if so in which direction or which package should i use, because i have never performed batch effect correction ?

Thank you in advance




ADD COMMENTlink modified 2.7 years ago by James W. MacDonald45k • written 2.7 years ago by svlachavas570
gravatar for James W. MacDonald
2.7 years ago by
United States
James W. MacDonald45k wrote:

You just add the batch to your design as a nuisance variable.

n2 <- pData(normalized2)
n2$Condition <- relevel(factor(n2$Condition), "Control1")
design <- model.matrix(~condition + replicate, n2)
fit <- lmFit(normalized2, design)
fit2 <- eBayes(fit)

Now the coefficients 2-5 compare each Biological vs Control after controlling for the batch effect.


ADD COMMENTlink written 2.7 years ago by James W. MacDonald45k

Dear Mr MacDonald,

in the above code do you mean 

n2$condition <- relevel(factor(n2$condition), "Control1") , because with the above it gives me an error:

Error in relevel.factor(factor(n2$Condition), "Control1") : 
  'ref' must be an existing level

ADD REPLYlink written 2.7 years ago by svlachavas570

Dear Mr MacDonald, one more naive but important question for me :

as i have performed before statistical significance a non-specific intensity filtering, should i also include the argument trend=TRUE in eBayes ? or it is no related to the possible batch effect that is present ? [I tried and it seems that the t statistic and p values change but not in a great degree]





ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by svlachavas570
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 199 users visited in the last hour