I haven't worked with methylation data, but from what I understand, you should be applying limma on M-values. These are more accurately modelled under normality, at least in regard to the range of values, the mean-variance relationship, etc. Some people seem to do all the linear modelling with M-values, and then report back the fold-changes, etc. for significant probes in terms of beta-values, which are easier to interpret.
Anyway, the logFC
field should represent the change in the average M-value between conditions, which - I think - is interpretable as a change in the log-odds of methylation. For example, a logFC
of 1 would indicate that in one condition, the odds of being methylated to being nonmethylated are twice as high as the other condition. Or, in the simplest terms: larger logFC
= stronger differential methylation. The AveExpr
field would be the average M-value across all samples, which gives you a measure of the overall amount of methylation for each probe. The B-statistic is the log-odds of differential methylation to constant methylation (note, not the log-odds of methylation to nonmethylation, which is the M-value itself). I tend not to use the B-statistic much for DE analyses as I find it a bit unintuitive, but to each his own.
Finally, the chosen reference depends on the parametrization of the design matrix. If you have a one-way layout and you construct a design with an intercept via model.matrix
, the alphabetically-first group will be the reference.
To add further to Aaron's comments, with methylation data the individual CpG sites may not be as informative as the local methylation status of all CpGs in a region. Using something like bumphunter to detect regions that appear to be consistently differentially methylated, and then fitting models based on a regional measure of methylation may be a more appropriate way to proceed. See the minfi vignette for more information.