Limma on Methylation Data - Fold Change Question
1
2
Entering edit mode
@andrewjskelton73-7074
Last seen 5 weeks ago
United Kingdom

Hi,

I've got a simple A vs B model design in Limma for 450K data (M-Values to avoid Heteroscedasticity), but I'm struggling with figuring out a fold change cutoff in topTable. If I had beta values, I could simply set a cutoff of 0.1 to see things with at least a 10% difference in methylation between conditions, but it doesn't seem to be that simple using M values. - How would I figure out a 10% difference in methylation in M-value space for a cutoff? - (This may just be log space doing a number of my brain on a friday afternoon!)

Has anyone got any advice?

limma 450k • 2.3k views
3
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 13 hours ago
The city by the bay

I don't think there's a one-to-one mapping between changes in the beta values and M-values. Consider a situation where you have m1, the intensity of methylation in sample 1; u1, the intensity of unmethylation in sample 1; and m2 and u2, the corresponding variables for sample 2. The M-value change between samples is defined as:

dM = log(m1/u1 * u2/m2)

while the beta value change between samples is defined as:

dB = m1/(m1 + u1) - m2/(m2 + u2)

I'm ignoring prior counts here, for simplicity. Now, consider an example where m1 = 2 * u1 and m2 = u2. This gives:

dM = log(2/1 * 1/1) = log(2)
dB = 2/3 - 1/2 = 1/6


Let's change it to m1 = 4 * u1 and m2 = 2 * u2. This gives:

dM = log(4/1 * 1/2) = log(2)
dB = 4/5 - 2/3 = 2/15

So, you can see that the same dM can correspond to different dB. This means that setting a threshold on the M-value change won't give you any single threshold on the beta value change. The relationship between the two depends on the individual observations, which means that any M-value threshold would need to be recomputed for each gene. If you're going to do that, you might as well compute the beta value change directly for each gene.

As an aside, if you want to account for the size of the log-fold changes in limma, you should use treat and topTreat rather than manually filtering on the log-fold changes in the output from topTable. Manual filtering can result in loss of FDR control if genes with large variances are preferentially selected.

0
Entering edit mode

ok, that makes a lot of sense, and proves that I was taking the wrong approach by trying to translate the beta change into M-Value space, so thank you for that breakdown, very much appreciate it. So I guess my followup is this, if I use limma and M-values to work out a p-value, then go back to the beta values, manually work out the beta difference, then that should give me essentially a 'beta cutoff' to accompany the p-value, does that make sense? My issue then comes if I'm doing a more complex model, i.e. controlling for technical variation in the model design, for example. Manually working that out algebraically will be difficult - I could however do two lmfits, one using m-values (to get a p value), and one using beta-values (to get a 'fold change cutoff'), both using the same model design - would that make sense?

So treat and topTreat are specifically designed to work with fold-change cutoffs?

1
Entering edit mode

I guess you could re-fit the linear model, to get a change in the beta values to accompany the p-values. A caution, though: your "beta cutoff" phrasing suggests that you want to filter on the computed beta change. I'd advise against this, as the problems with manual filtering and large variances are still relevant here.

As for your second question; yes, treat and company are designed to work with a log-fold change threshold; check out the documentation. In this case, though, you'll be computing p-values with the M-values. Thus, the treat threshold will be referring to the change in the M-values, not that of the beta value.