Multi factor design and removing the effect of sex on main factor
1
0
Entering edit mode
Mahendra • 0
@2e799a6e
Last seen 6 weeks ago
United States

Hi Michael Love,

I am pretty new in this. I have "wt" and "ko" RNA-seq samples, each three replicates for differential gene expression analysis. sample description as follows:

wt group = 2 males and 1 female ko group = 3 female

I noticed that my samples are grouping based on "sex" more than "genotype". I do not have more mice to repeat this experiment but I want to make some sense out of it.

Could you please help me with how I can correct effect of sex in my analysis or how I can remove sex-based influence using DEseq analysis?

Thank you, Mahendra

DESeq2 • 275 views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

The short answer is that you can't. You have almost completely aliased sex and KO status, and there is no method that will fix that situation. You have unfortunately learned a lesson the hard way, but hopefully in the future you will keep this in mind when doing a new experiment.

0
Entering edit mode

Hi James,

Thanks for your reply. Does ANOVA solve this problem since it provides p-values for both comparisons- genotype and sex? Then I can remove genes those have significantly differentially expressed in sex comparison to just interpret differentially expressed genes in genotype comparison.

Thanks, Mahendra

0
Entering edit mode

No, ANOVA doesn't solve the problem. Your WT samples are all male, and all but one female is KO. If you fit a model like ~ genotype + sex, then you will be apportioning the variability into two buckets that are almost the same and are likely to lose any significant results. Here's an example

> set.seed(12345)
> d.f <- data.frame(sex = rep(c("male","female"), c(2,4)), gt = rep(c("wt","ko"), each = 3), vals = rnorm(6) + c(2,2,2,0,0,0))
> d.f
sex gt       vals
1   male wt  2.5855288
2   male wt  2.7094660
3 female wt  1.8906967
4 female ko -0.4534972
5 female ko  0.6058875
6 female ko -1.8179560
> summary(lm(vals~gt + sex, d.f))

Call:
lm(formula = vals ~ gt + sex, data = d.f)

Residuals:
1          2          3          4          5          6
-6.197e-02  6.197e-02  1.388e-17  1.017e-01  1.161e+00 -1.263e+00

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.5552     0.5736  -0.968    0.404
gtwt          2.4459     1.1471   2.132    0.123
sexmale       0.7568     1.2167   0.622    0.578

Residual standard error: 0.9934 on 3 degrees of freedom
Multiple R-squared:  0.8195,    Adjusted R-squared:  0.6991
F-statistic: 6.809 on 2 and 3 DF,  p-value: 0.07671

> summary(lm(vals~gt, d.f))

Call:
lm(formula = vals ~ gt, data = d.f)

Residuals:
1       2       3       4       5       6
0.1903  0.3142 -0.5045  0.1017  1.1611 -1.2628

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.5552     0.5278  -1.052   0.3522
gtwt          2.9504     0.7464   3.953   0.0168 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9141 on 4 degrees of freedom
Multiple R-squared:  0.7962,    Adjusted R-squared:  0.7452
F-statistic: 15.63 on 1 and 4 DF,  p-value: 0.01678

> summary(lm(vals~sex, d.f))

Call:
lm(formula = vals ~ sex, data = d.f)

Residuals:
1        2        3        4        5        6
-0.06197  0.06197  1.83441 -0.50978  0.54960 -1.87424

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.05628    0.68225   0.082   0.9382
sexmale      2.59121    1.18169   2.193   0.0934 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.365 on 4 degrees of freedom
Multiple R-squared:  0.5459,    Adjusted R-squared:  0.4324
F-statistic: 4.808 on 1 and 4 DF,  p-value: 0.0934


Now let's make the females all really similar.

> d.f$vals[3] <- d.f$vals[3] - 1
> d.f
sex gt       vals
1   male wt  2.5855288
2   male wt  2.7094660
3 female wt  0.8906967
4 female ko -0.4534972
5 female ko  0.6058875
6 female ko -1.8179560
> summary(lm(vals~gt + sex, d.f))

Call:
lm(formula = vals ~ gt + sex, data = d.f)

Residuals:
1          2          3          4          5          6
-6.197e-02  6.197e-02  1.735e-17  1.017e-01  1.161e+00 -1.263e+00

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.5552     0.5736  -0.968    0.404
gtwt          1.4459     1.1471   1.260    0.297
sexmale       1.7568     1.2167   1.444    0.244

Residual standard error: 0.9934 on 3 degrees of freedom
Multiple R-squared:  0.8064,    Adjusted R-squared:  0.6773
F-statistic: 6.247 on 2 and 3 DF,  p-value: 0.08519

> summary(lm(vals~gt, d.f))

Call:
lm(formula = vals ~ gt, data = d.f)

Residuals:
1       2       3       4       5       6
0.5236  0.6476 -1.1712  0.1017  1.1611 -1.2628

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.5552     0.6467  -0.859   0.4390
gtwt          2.6171     0.9145   2.862   0.0459 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.12 on 4 degrees of freedom
Multiple R-squared:  0.6718,    Adjusted R-squared:  0.5898
F-statistic: 8.189 on 1 and 4 DF,  p-value: 0.04585

> summary(lm(vals~sex, d.f))

Call:
lm(formula = vals ~ sex, data = d.f)

Residuals:
1        2        3        4        5        6
-0.06197  0.06197  1.08441 -0.25978  0.79960 -1.62424

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.1937     0.5320  -0.364   0.7342
sexmale       2.8412     0.9215   3.083   0.0368 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.064 on 4 degrees of freedom
Multiple R-squared:  0.7039,    Adjusted R-squared:  0.6298
F-statistic: 9.507 on 1 and 4 DF,  p-value: 0.03681


Because sex and genotype are almost completely aliased, each one separately looks significant, but together they are not. And it's not possible to say if the difference is due to the genotype or to sex differences, or some combination of the two.

0
Entering edit mode

That makes sense. Thank you so much for your detailed explanation.