Question

DEseq2 design matrix for paired samples with batch effects

0

Entering edit mode

cylsae • 0

@cylsae-22516

Last seen 6.0 years ago

I have paired subjects from two different time points week1 and week2. In week1, half of the subjects are treated with condition 1 and half with condition 2. In week2, it is the other way around depending on the subjects' week1 treatment. A batch effect can be seen from the data between week1 and week2. I would like to use DESeq2 to perform DE analysis accounting for both batch and subject effect. Will a design = ~ batch + subject + condition makes sense ? If not, what will be the recommendation ? Thanks in advance.

deseq2 • 1.2k views

ADD COMMENT • link updated 6.0 years ago by Michael Love 43k • written 6.0 years ago by cylsae • 0

score 1 · Answer 1 · 2019-12-05

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 hours ago

United States

So you have two samples per patient? Condition 1 treated and condition 2 treated but not control?

And then there is the week variable splitting the patients in two.

ADD COMMENT • link 6.0 years ago Michael Love 43k

0

Entering edit mode

It is a clinical trial with crossover design (https://www.eupati.eu/clinical-development-and-trials/clinical-trial-designs/)

The two conditions are placebo treated (control) and drug treated.

Half of the subjects (randomly selected) were treated with placebo pills in visit 1 and treated with drug pills in visit 2. The other half of the subjects were treated with drug pills in visit 1 and treated with placebo in visit2.

I would like to perform paired-subject DE analysis to check drug effect. I also see batch effect between the two visits so the design shall take batch into consideration.

ADD REPLY • link 6.0 years ago cylsae • 0

1

Entering edit mode

Ok I see. This could be fit with a model such as:

Y = intercept + period + treatment + subject + error

And typically the subject effect is modeled with a random effect, leading to a mixed effects model. One option would be to use DESeq2 to scale and variance stabilize the counts with vst(dds) and then to analyze each gene with a mixed effects model in R, and perform multiple test correction on the per-gene p-values.

Another option would be to use limma and try to encode the within-subject correlations with duplicateCorrelation, but you may want to post again to get recommendations for how to encode the within-subject correlations in limma.

ADD REPLY • link 6.0 years ago Michael Love 43k

0

Entering edit mode

Thank you very much, Michael. As I understand from the vst function manual, vst(gene counts) produces a matrix one can directly compare across libraries, right? so no further library size normalisation needed ? The reason why I am asking is that, I tried to reverse the logarithm in the vst output by calculating 2^(vst output matrix), I see library (column) sums are quite different --- maybe I am doing the wrong thing but just want to be clear. Thanks.

ADD REPLY • link 6.0 years ago cylsae • 0

0

Entering edit mode

Yes the VST data is library size corrected. The column sum is a notoriously bad estimator for whether there are global shifts. Use a boxplot on log scale data (VST output).