We are working with a large population with more than 1200 samples. From those samples, we got both gene expression and methylation data using
Affymetrix HTA 2.0 and
Illumina 450k. Both data were pre-processed using
Affymetrix Expresion Control (
GCCN-SST-RMA normalization pipeline) and
After the QC on gene expression we detected a strong batch effect on gene expression data. No batch effect was detected on methylation at this step. After some test on data analysis, the gene expression batch effect can be solved adjusting the models with some technical variables. We discovered that methylation data also has some kind of batch effect.
Hence, we tried to adjust methylation models using SVA. But it took more than three days to create 61 surrogate variables. Once the model was adjusted by these 61 surrogate variables, the results were acceptable.
The questions are:
- Is there a way to speed up SVA? We need to test more than 100 models.
- We are considering to use PEER/RUV. Do you recommend us to use them on methylation data?