I'm analyzing DNA methylation microarray data from whole blood. My goal is to construct a linear model to check differences between 2 groups, young and old individuals (meth ~ age.group, which is categorical)
I know that blood cell type composition in the samples is mostly 'completely' confounded with age.group (the differences in cell type percentages between old and young are much more relevant than intragroup differences). Because of how SVA works, first regressing out my model and then finding covariates in the residual variation, my question is:
1) does this imply that the algorithm won't detect/correct this confounding (or the majority of it)?
2) can the solution be to use a reference-based cell type correction algorithm (such as Houseman, 2012) to take into account this confounder and use SVA afterwards?
2.1) if this approach is correct, is it better to correct my DNA methylation values first and input the corrected values to SVA, or to input uncorrected values to SVA and, in the null model, incorporate the cell type compositions as covariates?