Design Formula, PCA, and Sex Effect
When I initially performed PCA on my data, I noticed samples segregated mostly by sex (with the effect of interest being drug treatment). As such, I changed my design formula from just design = ~age to design = ~ sex + drugtx. This did not change the PCA plot relative to my original design formula, with samples still clustering by sex (although the significant DEGs did change). Is this to be expected? Why does the PCA plot not change when design formula is altered in this case?

DESeq2 • 123 views
Hi, yes, this is expected behaviour. By introducing extra terms into your design formula, the normalisation size factors will not change; thus, neither the normalised count data nor, later, the transformed expression levels will change (caveat here is when you transform via blind = FALSE).

Instead, when you have a formula like this, ~ sex + drugtx, when you then derive test statistics for drugtx, these statistical inferences will be adjusted for the effects of sex (and vice-versa). This is the same way in, e.g., epidemiology whereby we may want to adjust for, say, cockroach exposure, smoking status, and BMI when deriving p-values for our variant of interest:

outcome ~ SNP + Cockroach + Smoking + BMI


If you wish to directly modify your transformed expression levels for sex, then use limma::removeBatchEffect().

This is mentioned in the vignette: Why after VST are there still batches in the PCA plot?

PS - how did your design move from ~age to ~ sex + drugtx?

Thank you, Kevin, for your explanation, as well as your advice related to using limma::removeBatchEffect() if I desire to modify my transformed expression levels for sex.

Also, sorry about the design move from ~age to ~sex + drugtx. That was a typo on my part, in which age should've been drugtx (actually separately working on aging, too).

Thanks again!