Search
Question: What is logFC and "B" in methylation analysis ??
1
2.9 years ago by
AST50
INDIA
AST50 wrote:

Hi all,

Being new to 450k data analysis, there are certain output terms that I don't understand.

Can someone please explain me what we can infer from logFC, AvgExp and B in 450k methylation result output while using limma and how these values are calculated from Beta/M-values?

Do these values have any significance while inferring the 450K result?

I understand that these terms are important for expression data but they doesn't make sense to me for 450k data.

Also, how does limma decides which sample to use as reference for fitting among the contrasting groups?

modified 2.9 years ago by Aaron Lun21k • written 2.9 years ago by AST50
2
2.9 years ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

I haven't worked with methylation data, but from what I understand, you should be applying limma on M-values. These are more accurately modelled under normality, at least in regard to the range of values, the mean-variance relationship, etc. Some people seem to do all the linear modelling with M-values, and then report back the fold-changes, etc. for significant probes in terms of beta-values, which are easier to interpret.

Anyway, the logFC field should represent the change in the average M-value between conditions, which - I think - is interpretable as a change in the log-odds of methylation. For example, a logFC of 1 would indicate that in one condition, the odds of being methylated to being nonmethylated are twice as high as the other condition. Or, in the simplest terms: larger logFC = stronger differential methylation. The AveExpr field would be the average M-value across all samples, which gives you a measure of the overall amount of methylation for each probe. The B-statistic is the log-odds of differential methylation to constant methylation (note, not the log-odds of methylation to nonmethylation, which is the M-value itself). I tend not to use the B-statistic much for DE analyses as I find it a bit unintuitive, but to each his own.

Finally, the chosen reference depends on the parametrization of the design matrix. If you have a one-way layout and you construct a design with an intercept via model.matrix, the alphabetically-first group will be the reference.

3

To add further to Aaron's comments, with methylation data the individual CpG sites may not be as informative as the local methylation status of all CpGs in a region. Using something like bumphunter to detect regions that appear to be consistently differentially methylated, and then fitting models based on a regional measure of methylation may be a more appropriate way to proceed. See the minfi vignette for more information.