edgeR: Analyzing multiple factors with batch effects
1
0
Entering edit mode
venura • 0
@venura-22066
Last seen 7 weeks ago
United States

Hi,

I am working on a dataset containing time series (2hr and 5hr) and treatment (Drug vs Mock). I was using the design based on section 3.3 of the manual as Gordan suggested earlier. When I look at the MDS plot I see that there is a batch effect. Each treatment has a somewhat distant sample. I was looking at section 4.2 of the manual and "a guide to creating design matrices for gene expression designs". I am having some difficulties figuring out how to correct batch effects in my dataset. I tried adding a separate column to the grouping file as follows; and tried to use the following code. It ended up giving an error about the design matrix.

# make labels/grouping

#grouping targets
group <- factor(paste(targets$Treatment, targets$Time, sep = "."))
cbind(targets,group=group)
Batch <- factor(paste(targets$Batch, sep = ",")) #Make DEG List y <- DGEList(counts=rawdata,group=group) nrow(y) # TMM Normalization y <- calcNormFactors(y) y$samples

#plotsMDS
plotMDS(y, col=rep(1:12, each=3))

#checking batch effects
design <- model.matrix(~Batch+Batch:group)
logFC <- predFC(y,design,prior.count=1,dispersion=0.05)


I really appreciate it if someone can help me to get my design corrected and to get the batch effects removed. So I can check genes those were upregulated at 2hr and 5hr in treated samples. Thanks in advance.

Treatment   Time    Batch
TM_2hr_1    TM  2h  1
TM_2hr_2    TM  2h  2
TM_2hr_3    TM  2h  3
TM_MOCK_2hr_1   Mock    2h  4
TM_MOCK_2hr_2   Mock    2h  5
TM_MOCK_2hr_3   Mock    2h  6
TM_5hr_1    TM  5h  7
TM_5hr_2    TM  5h  8
TM_5hr_3    TM  5h  9
TM_MOCK_5hr_1   Mock    5h  10
TM_MOCK_5hr_2   Mock    5h  11
TM_MOCK_5hr_3   Mock    5h  12

edgeR • 136 views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 17 hours ago
United States

What you have there is a total of 12 batches, one per sample. That's not useful and doesn't reflect what you said (that one sample per time/treatment is in a different batch). You really need a factor with two levels, one for each batch. So as an example, if the _3 samples were all run at a different time, you could do

batch <- factor(rep(c(1,1,2), times = 4))


That's just an example and assumes that there is an actual batch effect rather than simply having one sample per group that seems off. If these samples weren't processed in different batches, then you might not want to assume a batch effect, and instead might want to assume that you have outliers that can be adjusted for by using a sample weight. Section 14.4 of the limma User's Guide covers when/why you should use array weights (same thing as a sample weight). I don't believe you can specify a sample weight using edgeR, but you could do so using voomLmFit.

Also, you don't want an interaction between batch and group. You would just do

design <- model.matrix(~ 0 + batch + group)

0
Entering edit mode

Thank you, James, for the clarification. As you pointed out it is the latter. It is caused by the biological sample. We do destructive sampling so each sample is a different plant. So there is a considerable amount of variation though we try to minimize it. I will check voomLmFit as you suggested. I thought about batch effects since it was the closest one I found for my issue.