Question

Modelling a multiplicative batch effect with DESeq2

0

Entering edit mode

Sam ▴ 10

@sam-21502

Last seen 2 days ago

Jerusalem

As far as I understand, The process by which DESeq2 models a batch effect is close to : Subtracting the arithmetic mean of the batches' expression values from the genes, on a per-gene basis.

Is it possible in DESeq2 instead of modelling only an additive batch effect, to model a multiplicative one as well? For example, divide the expression levels per gene by the batch-specific geometric mean?

This is the context for my question: I have tried to visualize batch effect removal via ComBat. After that, the different conditions were plotted on a PCA, separating very nicely. Later, I tried performing differential expression (I do not pass the ComBat values into DESeq2, but rather model the batch effect using the formula "~ batch + condition" as the design ). Despite the separation in the PCA, there was a very low number of genes passing FDR (about 30). I suspect that the reason is that ComBat estimates both additive and multiplicative batch effects, while DESeq2 models only additive ones. Judging by the low number of DE genes, I suspect that multiplicative batch effects exist in my data.

P.S. However, each of the compared conditions has only two samples; that might be an alternative explanation for the low number of DE genes.

deseq2 batch-effects ComBat • 1.3k views

ADD COMMENT • link updated 4.6 years ago by Michael Love 41k • written 4.6 years ago by Sam ▴ 10

0

Entering edit mode

Solarion • 0

@solarion-22030

Last seen 3.7 years ago

University Hospital Jena, Germany

I'm not an expert at all, so here just my thoughts: For a model.matrix if you don't know whether the effect is additive you would write it like this: model.matrix(~ diet + sex + diet:sex) OR model.matrix(~ diet*sex) (took it from http://genomicsclass.github.io/book/pages/expressingdesignformula.html)

In DESeq2 the standard workflow doesn't have this kind of input, but edgeR does in the glmQLFit function.

ADD COMMENT • link 4.6 years ago Solarion • 0

score 2 · Accepted Answer · 2019-10-08

The process by which DESeq2 models a batch effect is close to : Subtracting the arithmetic mean of the batches' expression values from the genes, on a per-gene basis.

The way that DESeq2 models batch effects is exactly the same as how it models differences due to condition.

See the third paragraph of the Results section of the DESeq2 paper:

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8#Sec2

or the third line of the equation block here:

http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#the-deseq2-model

We assume that the log2 of the expected mean of the Negative Binomial is explained by a linear combination of the covariates. So there will be a beta associated with the batch differences and a beta associated with the condition differences, in a model ~batch + condition.

So, given that we are modeling the log of the mean, we do have a multiplicative model for batch (or condition, or whatever covariate goes in the design).