I am working on a dataset containing time series (2hr and 5hr) and treatment (Drug vs Mock). I was using the design based on section 3.3 of the manual as Gordan suggested earlier. When I look at the MDS plot I see that there is a batch effect. Each treatment has a somewhat distant sample. I was looking at section 4.2 of the manual and "a guide to creating design matrices for gene expression designs". I am having some difficulties figuring out how to correct batch effects in my dataset. I tried adding a separate column to the grouping file as follows; and tried to use the following code. It ended up giving an error about the design matrix.
# make labels/grouping targets <- read.csv("groups.csv", header=TRUE, row.names=1 ) #grouping targets group <- factor(paste(targets$Treatment, targets$Time, sep = ".")) cbind(targets,group=group) Batch <- factor(paste(targets$Batch, sep = ",")) #Make DEG List y <- DGEList(counts=rawdata,group=group) nrow(y) # TMM Normalization y <- calcNormFactors(y) y$samples #plotsMDS plotMDS(y, col=rep(1:12, each=3)) #checking batch effects design <- model.matrix(~Batch+Batch:group) logFC <- predFC(y,design,prior.count=1,dispersion=0.05)
I really appreciate it if someone can help me to get my design corrected and to get the batch effects removed. So I can check genes those were upregulated at 2hr and 5hr in treated samples. Thanks in advance.
Treatment Time Batch TM_2hr_1 TM 2h 1 TM_2hr_2 TM 2h 2 TM_2hr_3 TM 2h 3 TM_MOCK_2hr_1 Mock 2h 4 TM_MOCK_2hr_2 Mock 2h 5 TM_MOCK_2hr_3 Mock 2h 6 TM_5hr_1 TM 5h 7 TM_5hr_2 TM 5h 8 TM_5hr_3 TM 5h 9 TM_MOCK_5hr_1 Mock 5h 10 TM_MOCK_5hr_2 Mock 5h 11 TM_MOCK_5hr_3 Mock 5h 12