I don't think there's a one-to-one mapping between changes in the beta values and M-values. Consider a situation where you have m1
, the intensity of methylation in sample 1; u1
, the intensity of unmethylation in sample 1; and m2
and u2
, the corresponding variables for sample 2. The M-value change between samples is defined as:
dM = log(m1/u1 * u2/m2)
while the beta value change between samples is defined as:
dB = m1/(m1 + u1) - m2/(m2 + u2)
I'm ignoring prior counts here, for simplicity. Now, consider an example where m1 = 2 * u1
and m2 = u2
. This gives:
dM = log(2/1 * 1/1) = log(2)
dB = 2/3 - 1/2 = 1/6
Let's change it to m1 = 4 * u1
and m2 = 2 * u2
. This gives:
dM = log(4/1 * 1/2) = log(2)
dB = 4/5 - 2/3 = 2/15
So, you can see that the same dM
can correspond to different dB
. This means that setting a threshold on the M-value change won't give you any single threshold on the beta value change. The relationship between the two depends on the individual observations, which means that any M-value threshold would need to be recomputed for each gene. If you're going to do that, you might as well compute the beta value change directly for each gene.
As an aside, if you want to account for the size of the log-fold changes in limma
, you should use treat
and topTreat
rather than manually filtering on the log-fold changes in the output from topTable
. Manual filtering can result in loss of FDR control if genes with large variances are preferentially selected.
ok, that makes a lot of sense, and proves that I was taking the wrong approach by trying to translate the beta change into M-Value space, so thank you for that breakdown, very much appreciate it. So I guess my followup is this, if I use limma and M-values to work out a p-value, then go back to the beta values, manually work out the beta difference, then that should give me essentially a 'beta cutoff' to accompany the p-value, does that make sense? My issue then comes if I'm doing a more complex model, i.e. controlling for technical variation in the model design, for example. Manually working that out algebraically will be difficult - I could however do two
lmfits
, one using m-values (to get a p value), and one using beta-values (to get a 'fold change cutoff'), both using the same model design - would that make sense?So
treat
andtopTreat
are specifically designed to work with fold-change cutoffs?I guess you could re-fit the linear model, to get a change in the beta values to accompany the p-values. A caution, though: your "beta cutoff" phrasing suggests that you want to filter on the computed beta change. I'd advise against this, as the problems with manual filtering and large variances are still relevant here.
As for your second question; yes,
treat
and company are designed to work with a log-fold change threshold; check out the documentation. In this case, though, you'll be computing p-values with the M-values. Thus, thetreat
threshold will be referring to the change in the M-values, not that of the beta value.