Im currently using limma to analyse methylation array data. The typical workflow is:
fit <- lmFit(methylation_data, design, method = "robust")
fit_2 <- eBayes(fit)
table <- limma::topTable(fit_2, coef = 2, number = Inf)
Where the design is:
# NOT RUN - code simplified
mod <- model.matrix((~ variable_of_interest+covariate1+covariate2...), data=p)
This is effectively running regressions with methylation as the outcome and the variable of interest as the predictor - is there any way to run this so methylation is the predictor and the variable of interest the outcome?
You can't run the model with methylation status as the predictor using limma. And it wouldn't make much sense to fit the inverse model in a univariate fashion like that anyway, either statistically or biologically. You wouldn't expect a single CpG to have a discernable phenotypic effect. Instead, you should expect that consistent changes in methylation status might have an effect on a phenotype, where 'consistent changes' could mean many different things. It could be that the average methylation status (over all of the CpGs, or some relevant subset based on proximity to a set of genes or LINE repeats or whatever) is increased in people with a given phenotype, so you could hypothetically subset to the CpGs you think are interesting and use those.
But the information from those CpGs may be highly correlated, and there are too many predictors anyway, so conventionally you would use some sort of penalized regression where you select the 'best' CpGs from a set of inputs, or just project the CpG information into a lower space and use the projected values instead. You could use the ropls package for that sort of thing.