Recently we have a list of RNAseq samples with different batch, age, gender and cell type. We want to get the normalized values of reads count using the variable stabilize transformation in DESeq2. Ideally we want to keep the difference between cell type, but remove the effects from the other covariances (e.g. batch, age, gender).
Two questions regarding this:
1. Should we use blind=TRUE for transformation?
2. If not, how do we design the formula?
I’ve read your vignettes and some relevant posts, such as
For question #1, I think I am pretty sure that I should use blind=FALSE (in order to keep the difference between cell types).
But then, which formula should I use for the design?
~ batch + age + sex + cellType
What’s the difference between them for VST result?
If I use the full formula "~ batch + age + sex + cellType”, will the VST internally remove effects of batch + age + sex, but keep difference for cell type?
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)
 LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
 LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
 LC_PAPER=en_US.UTF-8 LC_NAME=C
 LC_ADDRESS=C LC_TELEPHONE=C
 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
 parallel stats graphics grDevices utils datasets methods
other attached packages:
 DESeq2_1.2.10 RcppArmadillo_0.4.450.1.0
 Rcpp_0.11.1 GenomicRanges_1.14.4
 XVector_0.2.0 IRanges_1.20.7
loaded via a namespace (and not attached):
 annotate_1.40.1 AnnotationDbi_1.24.0 Biobase_2.22.0
 DBI_0.3.1 genefilter_1.44.0 grid_3.0.2
 lattice_0.20-23 locfit_1.5-9.1 RColorBrewer_1.0-5
 RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2
 survival_2.37-7 XML_3.98-1.1 xtable_1.7-4