Question

Questions in limma about multifactorial design and possible batch effect correction

0

Entering edit mode

svlachavas ▴ 830

@svlachavas-7225

Last seen 6 months ago

Germany/Heidelberg/German Cancer Resear…

Dear Bioconductor Community,

i have tried to use linear modeling in the below expression set for the variable "condition", which describes 5 levels of one control and 4 biological substances. in order to evaluate and test the possible changes in gene expression on a specific cell line between the "control" and the 4 substances. My code is below:

pData(normalized2)

                           condition        replicate
dataset_603.dat      Control1             1
dataset_604.dat      Control1             2
dataset_605.dat      Biological1         1
dataset_606.dat      Biological1         2
dataset_607.dat      Biological2         1
dataset_608.dat      Biological2         2
dataset_609.dat      Biological3         1
dataset_610.dat      Biological3         2
dataset_611.dat      Biological4         1
dataset_612.dat      Biological4         2

f <- factor(normalized2$condition, levels=c("Control1","Biological1","Biological2", "Biological3", "Biological4"))

design <- model.matrix(~0+f)

fit <- lmFit(normalized2, design)

contrast.matrix <- makeContrasts(fBiological1-fControl1, fBiological2-fControl1, fBiological3-fControl1, fBiological4-fControl1, levels=design)

fit2 <- contrasts.fit(fit, contrast.matrix)

fit2 <- eBayes(fit2)

Unfortunately, as anyone can see from the pData object, my dataset consists of two different batches: that is, each substance and the control is comprized of two biological replicates-and each batch is compized of 5 CEL files(which are characterized by number 1 and number 2 respectively(i.e those CEL files samples that have replicate=2 have been preprossessed also together in a different time). Thus, my main consern is if and how could i use from limma the information of the batch that each CEL belongs(normalized2$replicate) in order to include it as a random effect in limma ? Or my approach is completely wrong and i should perform some batch effect correction ? and if so in which direction or which package should i use, because i have never performed batch effect correction ?

Thank you in advance

limma batcheffect batcheffectcorrection multiple factor design • 2.8k views

ADD COMMENT • link updated 9.0 years ago by James W. MacDonald 65k • written 9.0 years ago by svlachavas ▴ 830

score 2 · Answer 1 · 2015-05-05

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

You just add the batch to your design as a nuisance variable.

n2 <- pData(normalized2)
n2$Condition <- relevel(factor(n2$Condition), "Control1")
design <- model.matrix(~condition + replicate, n2)
fit <- lmFit(normalized2, design)
fit2 <- eBayes(fit)

Now the coefficients 2-5 compare each Biological vs Control after controlling for the batch effect.

ADD COMMENT • link 9.0 years ago James W. MacDonald 65k

0

Entering edit mode

Dear Mr MacDonald,

in the above code do you mean

n2$condition <- relevel(factor(n2$condition), "Control1") , because with the above it gives me an error:

Error in relevel.factor(factor(n2$Condition), "Control1") :
'ref' must be an existing level

ADD REPLY • link 9.0 years ago svlachavas ▴ 830

0

Entering edit mode

Dear Mr MacDonald, one more naive but important question for me :

as i have performed before statistical significance a non-specific intensity filtering, should i also include the argument trend=TRUE in eBayes ? or it is no related to the possible batch effect that is present ? [I tried and it seems that the t statistic and p values change but not in a great degree]

ADD REPLY • link 9.0 years ago svlachavas ▴ 830