Question

Using Effect Coding or Orthogonal Coding instead of dummy coding in DESeq2?

0

Entering edit mode

Ndimensional ▴ 20

@vlaufer-14169

Last seen 9 months ago

United States

I have a balanced study design including two variables, genotype and location. I have a research question of the following form:

"Does genotype status (knockout vs wildtype) affect gene expression in intestinal surface cells differently than it affects intestinal crypt cells"

For this, my proposed model is:

Count ~ Batch + Genotype + Location + Genotype:Location

and I am most interested in the genotype by location interaction effect. When this is coded into the model matrix, if we use dummy coding we will obtain something like the following:

model_matrix<-model.matrix(design_statement, as.data.frame(colData(dds_d11_GxL)))

model_matrix

`name`	`(Intercept)`	`batch:R2`	`num_male`	`genotype:ko`	`location:crypt`	`genotypeko:locationcrypt`
`crypt_d11_wt_1`	`1`	`0`	`2`	`0`	`1`	`0`
`surfc_d11_wt_1`	`1`	`0`	`2`	`0`	`0`	`0`
`crypt_d11_ko_1`	`1`	`0`	`3`	`1`	`1`	`1`
`surfc_d11_ko_1`	`1`	`0`	`3`	`1`	`0`	`0`
`crypt_d11_wt_2`	`1`	`1`	`0`	`0`	`1`	`0`
`surfc_d11_wt_2`	`1`	`1`	`0`	`0`	`0`	`0`
`crypt_d11_ko_2`	`1`	`1`	`0`	`1`	`1`	`1`
`surfc_d11_ko_2`	`1`	`1`	`0`	`1`	`0`	`0`

However, when with dummy coding of this kind, a dependence relationship is produced between the main effect and interaction effects. This can be circumvented through the use of things like effect coding or orthogonal coding.

But, when I searched through DESeq2 vignette, manual, and literature, I was not able to find anyone else who had used this. So, I am writing to ask if it would be OK to respecify a model matrix of the form:

`name`	`(Intercept)`	`batch:R2`	`genotype:ko`	`location:crypt`	`genotypeko:locationcrypt`
`crypt_d11_wt_1`	`1`	-1	`-1`	`1`	`-1`
`surfc_d11_wt_1`	`1`	-1	`-1`	`-1`	`1`
`crypt_d11_ko_1`	`1`	-1	`1`	`1`	`1`
`surfc_d11_ko_1`	`1`	-1	`1`	`-1`	`-1`
`crypt_d11_wt_2`	`1`	`1`	`-1`	`1`	`-1`
`surfc_d11_wt_2`	`1`	`1`	`-1`	`-1`	`1`
`crypt_d11_ko_2`	`1`	`1`	`1`	`1`	`1`
`surfc_d11_ko_2`	`1`	`1`	`1`	`-1`	`-1`

Here, the correlation between the main effect and interaction effect is 0, but I do not know if for some reason this coding is unacceptable for DESeq2. Thank you very much.

deseq2 multiple factor design interactions orthogonal coding effect coding • 1.3k views

ADD COMMENT • link updated 6.4 years ago by Peter Langfelder ★ 3.0k • written 6.4 years ago by Ndimensional ▴ 20

score 0 · Answer 1 · 2017-11-30

There should be no problems with your coding, just be aware that the reported log2foldChange represent fold change with one unit of change in each variable. Since your variables are defined so that different levels correspond to -1 and 1, i.e., the difference is 2, the reported log2 fold changes will be one half of the physical fold change (e.g., if DESeq2 reports log2 fold change of 0.5 for gene X between locations, the physical log2 fold change is 1).

The sample number is really small in your experiment though, you have 1 sample for each combination of batch, genotype and location... not sure meaningful your results will be.