Dear Bioconductor Community,
i have tried to use linear modeling in the below expression set for the variable "condition", which describes 5 levels of one control and 4 biological substances. in order to evaluate and test the possible changes in gene expression on a specific cell line between the "control" and the 4 substances. My code is below:
pData(normalized2) condition replicate dataset_603.dat Control1 1 dataset_604.dat Control1 2 dataset_605.dat Biological1 1 dataset_606.dat Biological1 2 dataset_607.dat Biological2 1 dataset_608.dat Biological2 2 dataset_609.dat Biological3 1 dataset_610.dat Biological3 2 dataset_611.dat Biological4 1 dataset_612.dat Biological4 2 f <- factor(normalized2$condition, levels=c("Control1","Biological1","Biological2", "Biological3", "Biological4")) design <- model.matrix(~0+f) fit <- lmFit(normalized2, design) contrast.matrix <- makeContrasts(fBiological1-fControl1, fBiological2-fControl1, fBiological3-fControl1, fBiological4-fControl1, levels=design) fit2 <- contrasts.fit(fit, contrast.matrix) fit2 <- eBayes(fit2)
Unfortunately, as anyone can see from the pData object, my dataset consists of two different batches: that is, each substance and the control is comprized of two biological replicates-and each batch is compized of 5 CEL files(which are characterized by number 1 and number 2 respectively(i.e those CEL files samples that have replicate=2 have been preprossessed also together in a different time). Thus, my main consern is if and how could i use from limma the information of the batch that each CEL belongs(normalized2$replicate) in order to include it as a random effect in limma ? Or my approach is completely wrong and i should perform some batch effect correction ? and if so in which direction or which package should i use, because i have never performed batch effect correction ?
Thank you in advance