Hi,
I am working on a dataset containing time series (2hr and 5hr) and treatment (Drug vs Mock). I was using the design based on section 3.3 of the manual as Gordan suggested earlier. When I look at the MDS plot I see that there is a batch effect. Each treatment has a somewhat distant sample. I was looking at section 4.2 of the manual and "a guide to creating design matrices for gene expression designs". I am having some difficulties figuring out how to correct batch effects in my dataset. I tried adding a separate column to the grouping file as follows; and tried to use the following code. It ended up giving an error about the design matrix.
# make labels/grouping
targets <- read.csv("groups.csv", header=TRUE, row.names=1 )
#grouping targets
group <- factor(paste(targets$Treatment, targets$Time, sep = "."))
cbind(targets,group=group)
Batch <- factor(paste(targets$Batch, sep = ","))
#Make DEG List
y <- DGEList(counts=rawdata,group=group)
nrow(y)
# TMM Normalization
y <- calcNormFactors(y)
y$samples
#plotsMDS
plotMDS(y, col=rep(1:12, each=3))
#checking batch effects
design <- model.matrix(~Batch+Batch:group)
logFC <- predFC(y,design,prior.count=1,dispersion=0.05)
I really appreciate it if someone can help me to get my design corrected and to get the batch effects removed. So I can check genes those were upregulated at 2hr and 5hr in treated samples. Thanks in advance.
Treatment Time Batch
TM_2hr_1 TM 2h 1
TM_2hr_2 TM 2h 2
TM_2hr_3 TM 2h 3
TM_MOCK_2hr_1 Mock 2h 4
TM_MOCK_2hr_2 Mock 2h 5
TM_MOCK_2hr_3 Mock 2h 6
TM_5hr_1 TM 5h 7
TM_5hr_2 TM 5h 8
TM_5hr_3 TM 5h 9
TM_MOCK_5hr_1 Mock 5h 10
TM_MOCK_5hr_2 Mock 5h 11
TM_MOCK_5hr_3 Mock 5h 12
Thank you, James, for the clarification. As you pointed out it is the latter. It is caused by the biological sample. We do destructive sampling so each sample is a different plant. So there is a considerable amount of variation though we try to minimize it. I will check
voomLmFit
as you suggested. I thought about batch effects since it was the closest one I found for my issue.