Question: Using Effect Coding or Orthogonal Coding instead of dummy coding in DESeq2?
0
18 months ago by
vlaufer0
vlaufer0 wrote:

I have a balanced study design including two variables, genotype and location. I have a research question of the following form:

"Does genotype status (knockout vs wildtype) affect gene expression in intestinal surface cells differently than it affects intestinal crypt cells"

For this, my proposed model is:

Count ~ Batch + Genotype + Location + Genotype:Location

and I am most interested in the genotype by location interaction effect. When this is coded into the model matrix, if we use dummy coding we will obtain something like the following:

model_matrix<-model.matrix(design_statement,    as.data.frame(colData(dds_d11_GxL)))

model_matrix

 name (Intercept) batch:R2 num_male genotype:ko location:crypt genotypeko:locationcrypt crypt_d11_wt_1 1 0 2 0 1 0 surfc_d11_wt_1 1 0 2 0 0 0 crypt_d11_ko_1 1 0 3 1 1 1 surfc_d11_ko_1 1 0 3 1 0 0 crypt_d11_wt_2 1 1 0 0 1 0 surfc_d11_wt_2 1 1 0 0 0 0 crypt_d11_ko_2 1 1 0 1 1 1 surfc_d11_ko_2 1 1 0 1 0 0

However, when with dummy coding of this kind, a dependence relationship is produced between the main effect and interaction effects. This can be circumvented through the use of things like effect coding or orthogonal coding.

But, when I searched through DESeq2 vignette, manual, and literature, I was not able to find anyone else who had used this. So, I am writing to ask if it would be OK to respecify a model matrix of the form:

 name (Intercept) batch:R2 genotype:ko location:crypt genotypeko:locationcrypt crypt_d11_wt_1 1 -1 -1 1 -1 surfc_d11_wt_1 1 -1 -1 -1 1 crypt_d11_ko_1 1 -1 1 1 1 surfc_d11_ko_1 1 -1 1 -1 -1 crypt_d11_wt_2 1 1 -1 1 -1 surfc_d11_wt_2 1 1 -1 -1 1 crypt_d11_ko_2 1 1 1 1 1 surfc_d11_ko_2 1 1 1 -1 -1

Here, the correlation between the main effect and interaction effect is 0, but I do not know if for some reason this coding is unacceptable for DESeq2. Thank you very much.

modified 18 months ago by Peter Langfelder2.1k • written 18 months ago by vlaufer0
Answer: Using Effect Coding or Orthogonal Coding instead of dummy coding in DESeq2?
0
18 months ago by
United States
Peter Langfelder2.1k wrote:

There should be no problems with your coding, just be aware that the reported log2foldChange represent fold change with one unit of change in each variable. Since your variables are defined so that different levels correspond to -1 and 1, i.e., the difference is 2, the reported log2 fold changes will be one half of the physical fold change (e.g., if DESeq2 reports log2 fold change of 0.5 for gene X between locations, the physical log2 fold change is 1).

The sample number is really small in your experiment though, you have 1 sample for each combination of batch, genotype and location... not sure meaningful your results will be.

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.