We have RNA-seq data on controls and cases along with known co-variates like race, age, sex, RIN and library batch. So, in DESeq2 I could correct for these covariates as follows:
On the other hand, svaseq with normalized counts (within DESeq2) identified 13 variables for the same data.
My question is regarding the design that corrects for both known and unknown surrogate variables:
2) design=~SV1.......SV13+condition assuming svaseq is accounting for differences due to known covariates as well.
In design1 we are correcting for 18 variables and in design2 we are correcting for 13 variables, for a sample size of ~100 are we over-correcting?