I have paired subjects from two different time points week1 and week2. In week1, half of the subjects are treated with condition 1 and half with condition 2. In week2, it is the other way around depending on the subjects' week1 treatment. A batch effect can be seen from the data between week1 and week2. I would like to use DESeq2 to perform DE analysis accounting for both batch and subject effect. Will a design = ~ batch + subject + condition makes sense ? If not, what will be the recommendation ? Thanks in advance.

It is a clinical trial with crossover design (https://www.eupati.eu/clinical-development-and-trials/clinical-trial-designs/)
The two conditions are placebo treated (control) and drug treated.
Half of the subjects (randomly selected) were treated with placebo pills in visit 1 and treated with drug pills in visit 2. The other half of the subjects were treated with drug pills in visit 1 and treated with placebo in visit2.
I would like to perform paired-subject DE analysis to check drug effect. I also see batch effect between the two visits so the design shall take batch into consideration.
Ok I see. This could be fit with a model such as:
Y = intercept + period + treatment + subject + error
And typically the subject effect is modeled with a random effect, leading to a mixed effects model. One option would be to use DESeq2 to scale and variance stabilize the counts with
vst(dds)and then to analyze each gene with a mixed effects model in R, and perform multiple test correction on the per-gene p-values.Another option would be to use limma and try to encode the within-subject correlations with
duplicateCorrelation, but you may want to post again to get recommendations for how to encode the within-subject correlations in limma.Thank you very much, Michael. As I understand from the vst function manual, vst(gene counts) produces a matrix one can directly compare across libraries, right? so no further library size normalisation needed ? The reason why I am asking is that, I tried to reverse the logarithm in the vst output by calculating 2^(vst output matrix), I see library (column) sums are quite different --- maybe I am doing the wrong thing but just want to be clear. Thanks.
Yes the VST data is library size corrected. The column sum is a notoriously bad estimator for whether there are global shifts. Use a boxplot on log scale data (VST output).
Perfect. Thank you very much.