Hi I have a quick question about a paired samples and batch effect correction. Especially I am seeking a solution for edgeR or limma/voom.
A few days ago, I fount the following post.
correction for batch effects and paired-design of RNA-seq samples
In his case, subject and batch are identical, so "There is no need to do any batch correction. The baseline differences between subjects, including any batch effect, is already accounted for in the paired analysis."
group <- factor(c("wt","wt","wt","ko","ko","ko")) subject <- as.factor(c(1,2,3,1,2,3)) batch <- as.factor(c(1,2,3,1,2,3))
But how about the following example? In this case, a given subject's wt/ko samples are from different batches.
group <- factor(c("wt","wt","wt","ko","ko","ko")) subject <- as.factor(c(1,2,3,1,2,3)) batch <- as.factor(c(1,1,1,2,2,1))
What is the best model design to correct batch effect and to predict DEG between WT and KO groups? Could you suggest the model fomula?
Thanks, Aaron.
I have another question about the order of coefficients in your formula. You placed the "batch" term in the last. Isn't it testing sig. differences between batches, not between groups. I found your prev. comment here (https://support.bioconductor.org/p/96627/)?
I want to compare two groups (KO -vs- WT) of paired/matched samples after correcting batch effect. How about (~batch + subject + group)? Does this formula work?
BTW, my example was just a toy example, not a real exp design.
Question 1: If you were to use the default
coef=ncol(design)
inglmLRT
orglmQLFTest
, then yes, you would be testing for DE between batches. But this is easy to change; just setcoef
orcontrasts
appropriately to perform the comparison you actually want to do. You shouldn't rely on the defaultcoef
anyway, it only works in limited scenarios.Question 2: See my answer to question 1. Design matrix formulation choices like the order of factors in an additive formula (and whether or not to use an intercept) don't affect the final result as long as you set the contrast correctly, via
coef
orcontrasts
. This should be straightforward to do if you really understand what your design matrix means.what if the design was:
ie: one set of individuals sequenced day 1, different set of individuals sequenced day 2
would the design still be:
design <- model.matrix(~group + subject + batch)