Using Effect Coding or Orthogonal Coding instead of dummy coding in DESeq2?
1
0
Entering edit mode
Ndimensional ▴ 20
@vlaufer-14169
Last seen 9 months ago
United States

I have a balanced study design including two variables, genotype and location. I have a research question of the following form:

"Does genotype status (knockout vs wildtype) affect gene expression in intestinal surface cells differently than it affects intestinal crypt cells"

For this, my proposed model is:

Count ~ Batch + Genotype + Location + Genotype:Location 

and I am most interested in the genotype by location interaction effect. When this is coded into the model matrix, if we use dummy coding we will obtain something like the following:

model_matrix<-model.matrix(design_statement,    as.data.frame(colData(dds_d11_GxL)))

model_matrix

name

(Intercept)

batch:R2

num_male

genotype:ko

location:crypt

genotypeko:locationcrypt

crypt_d11_wt_1

1

0

2

0

1

0

surfc_d11_wt_1

1

0

2

0

0

0

crypt_d11_ko_1

1

0

3

1

1

1

surfc_d11_ko_1

1

0

3

1

0

0

crypt_d11_wt_2

1

1

0

0

1

0

surfc_d11_wt_2

1

1

0

0

0

0

crypt_d11_ko_2

1

1

0

1

1

1

surfc_d11_ko_2

1

1

0

1

0

0

However, when with dummy coding of this kind, a dependence relationship is produced between the main effect and interaction effects. This can be circumvented through the use of things like effect coding or orthogonal coding.

But, when I searched through DESeq2 vignette, manual, and literature, I was not able to find anyone else who had used this. So, I am writing to ask if it would be OK to respecify a model matrix of the form: 

 

name (Intercept) batch:R2 genotype:ko location:crypt genotypeko:locationcrypt
crypt_d11_wt_1 1 -1 -1 1 -1
surfc_d11_wt_1 1 -1 -1 -1 1
crypt_d11_ko_1 1 -1 1 1 1
surfc_d11_ko_1 1 -1 1 -1 -1
crypt_d11_wt_2 1 1 -1 1 -1
surfc_d11_wt_2 1 1 -1 -1 1
crypt_d11_ko_2 1 1 1 1 1
surfc_d11_ko_2 1 1 1 -1 -1

 

Here, the correlation between the main effect and interaction effect is 0, but I do not know if for some reason this coding is unacceptable for DESeq2. Thank you very much.

deseq2 multiple factor design interactions orthogonal coding effect coding • 1.3k views
ADD COMMENT
0
Entering edit mode
@peter-langfelder-4469
Last seen 4 weeks ago
United States

There should be no problems with your coding, just be aware that the reported log2foldChange represent fold change with one unit of change in each variable. Since your variables are defined so that different levels correspond to -1 and 1, i.e., the difference is 2, the reported log2 fold changes will be one half of the physical fold change (e.g., if DESeq2 reports log2 fold change of 0.5 for gene X between locations, the physical log2 fold change is 1).

The sample number is really small in your experiment though, you have 1 sample for each combination of batch, genotype and location... not sure meaningful your results will be.
 

ADD COMMENT

Login before adding your answer.

Traffic: 523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6