I have WGBS data of a normal (n=6) and disease-state (n=6) tissue. I believe that the data needs to be corrected for confounding variables, since it was processed in multiple batches. I don't expect many large differences between the normal and disease-state either, so SVA seems to be the best available correction method since it will remove unknown confounders as well.
However, SVA was designed for normally distributed data, and WGBS data is far from normally distributed. I can't find any literature on using SVA on WGBS, so I'm not sure if it would be appropriate. SVA has been performed on M-values from methylation chip data though.
So I guess my question is: Are there any papers/software that perform SVA on WGBS data and use the surrogate variables in some type of linear model? I know DSS.general and dmrseq allow for covariates to be used in their models, but again, I am not sure if using surrogate variables on ratio data is appropriate.
Possible routes for analysis:
Plan 1 (if possible): Use DSS.general and/or dmrseq and/or any other software with surrogate variables as covariates.
Pros: uses count-based statistics and removes all confounding effects
Plan 2: Use DSS.general and/or dmrseq (I believe that the modeling in these packages are similar, but they use different approaches to find FDR) with known confounders (like batch and age) as covariates.
Pros: uses count-based statistics Cons: can't remove the effects of unknown confounders
Plan 3: Smooth data and obtain methylation levels for high-coverage CpGs (BSmooth). Transform to M-values and use limma (with surrogate variables from M-values as covariates) to find differentially methylated loci. Pros: can implement SVA Cons: can't use count-based statistics; results from M-value analysis are not always biologically relevant
Thanks in advance