Question: Questions in limma about multifactorial design and possible batch effect correction
0
4.0 years ago by
svlachavas650
Greece/Athens/National Hellenic Research Foundation
svlachavas650 wrote:

Dear Bioconductor Community,

i have tried to use linear modeling in the below expression set for the variable "condition", which describes 5 levels of one control and 4 biological substances. in order to evaluate and test the possible changes in gene expression on a specific cell line between the "control" and the 4 substances. My code is below:

pData(normalized2)

condition        replicate
dataset_603.dat      Control1             1
dataset_604.dat      Control1             2
dataset_605.dat      Biological1         1
dataset_606.dat      Biological1         2
dataset_607.dat      Biological2         1
dataset_608.dat      Biological2         2
dataset_609.dat      Biological3         1
dataset_610.dat      Biological3         2
dataset_611.dat      Biological4         1
dataset_612.dat      Biological4         2

f <- factor(normalized2$condition, levels=c("Control1","Biological1","Biological2", "Biological3", "Biological4")) design <- model.matrix(~0+f) fit <- lmFit(normalized2, design) contrast.matrix <- makeContrasts(fBiological1-fControl1, fBiological2-fControl1, fBiological3-fControl1, fBiological4-fControl1, levels=design) fit2 <- contrasts.fit(fit, contrast.matrix) fit2 <- eBayes(fit2) Unfortunately, as anyone can see from the pData object, my dataset consists of two different batches: that is, each substance and the control is comprized of two biological replicates-and each batch is compized of 5 CEL files(which are characterized by number 1 and number 2 respectively(i.e those CEL files samples that have replicate=2 have been preprossessed also together in a different time). Thus, my main consern is if and how could i use from limma the information of the batch that each CEL belongs(normalized2$replicate) in order to include it as a random effect in limma ? Or my approach is completely wrong and i should perform some batch effect correction ? and if so in which direction or which package should i use, because i have never performed batch effect correction ?

modified 4.0 years ago by James W. MacDonald49k • written 4.0 years ago by svlachavas650
Answer: Questions in limma about multifactorial design and possible batch effect correct
2
4.0 years ago by
United States
James W. MacDonald49k wrote:

You just add the batch to your design as a nuisance variable.

n2 <- pData(normalized2)
n2$Condition <- relevel(factor(n2$Condition), "Control1")
design <- model.matrix(~condition + replicate, n2)
fit <- lmFit(normalized2, design)
fit2 <- eBayes(fit)

Now the coefficients 2-5 compare each Biological vs Control after controlling for the batch effect.

Dear Mr MacDonald,

in the above code do you mean

n2$condition <- relevel(factor(n2$condition), "Control1") , because with the above it gives me an error:

Error in relevel.factor(factor(n2\$Condition), "Control1") :
'ref' must be an existing level

Dear Mr MacDonald, one more naive but important question for me :

as i have performed before statistical significance a non-specific intensity filtering, should i also include the argument trend=TRUE in eBayes ? or it is no related to the possible batch effect that is present ? [I tried and it seems that the t statistic and p values change but not in a great degree]