3.0 years ago by

Cambridge, United Kingdom

I don't think there's a one-to-one mapping between changes in the beta values and M-values. Consider a situation where you have `m1`

, the intensity of methylation in sample 1; `u1`

, the intensity of unmethylation in sample 1; and `m2`

and `u2`

, the corresponding variables for sample 2. The M-value change between samples is defined as:

dM = log(m1/u1 * u2/m2)

while the beta value change between samples is defined as:

dB = m1/(m1 + u1) - m2/(m2 + u2)

I'm ignoring prior counts here, for simplicity. Now, consider an example where `m1 = 2 * u1`

and `m2 = u2`

. This gives:

dM = log(2/1 * 1/1) = log(2)
dB = 2/3 - 1/2 = 1/6

Let's change it to `m1 = 4 * u1`

and `m2 = 2 * u2`

. This gives:

dM = log(4/1 * 1/2) = log(2)
dB = 4/5 - 2/3 = 2/15

So, you can see that the same `dM`

can correspond to different `dB`

. This means that setting a threshold on the M-value change won't give you any single threshold on the beta value change. The relationship between the two depends on the individual observations, which means that any M-value threshold would need to be recomputed for each gene. If you're going to do that, you might as well compute the beta value change directly for each gene.

As an aside, if you want to account for the size of the log-fold changes in `limma`

, you should use `treat`

and `topTreat`

rather than manually filtering on the log-fold changes in the output from `topTable`

. Manual filtering can result in loss of FDR control if genes with large variances are preferentially selected.

•

link
modified 3.0 years ago
•
written
3.0 years ago by
Aaron Lun • **20k**