Question: Significant DMRs in genes with no single significant CpG using limma
0
3 months ago by
anne-kristin.stavrum0 wrote:

I have analysed my data (from the Epic Illumina array), using limma to find single differentially methylated CpGs (DMPs), and DMRcate to find DMRs. The data and the model is exactly the same for both analyses; using no intercept and specifying a contrast of interest.

When I compare the lists of genes that the significant DMRs and DMPs map to, there is of course an overlap, but more than half of the genes with a significant DMR do not show up on the list I get using limma. When I check all CpGs mapping to these genes (not just the ones in the DMR), then none of them show up on the list of significant DMPs. I understand that due to the smoothing function, some DMRs may stretch into the promoter region of a gene, hence it is possible to get genes on the list of DMRs that do not appear on the list from limma, but I have examples where the DMR is the exon of a gene. In this case I would expect that at least one of the CpGs would show up on the list of significant DMPs from limma, since I thought at least one CpG within the DMR would have to be individually significant. Is this not the case?

My question is therefore why this happens? Does the smoothing function of DMRcate give me lots of false positives?

Kind regards, Anne-Kristin

dmrcate • 77 views
modified 3 months ago by Tim Peters80 • written 3 months ago by anne-kristin.stavrum0
Answer: Significant DMRs in genes with no single significant CpG using limma
0
3 months ago by
Tim Peters80
Australia
Tim Peters80 wrote:

Hi Ann-Kristin,

In this case I would expect that at least one of the CpGs would show up on the list of significant DMPs from limma, since I thought at least one CpG within the DMR would have to be individually significant. Is this not the case?

No, this is not necessarily the case. DMRcate does not define DMRs on the basis of DMPs themselves, only that the FDR threshold used to define them is indexed at the rate of that of DMPs, at whatever rate you specify in cpg.annotate(). Depending on how the limma t-statistics are spatially distributed, it is very likely you'll get at least some DMRs that contain no DMPs, and DMPs that are not constitutive of DMRs.

My question is therefore why this happens?

DMRcate considers all CpGs when smoothing, not just the DMPs. So a contiguous critical mass of CpGs all with a modest effect that is nevertheless just below the DMP FDR threshold will be aggregated to a point more significant than, say, a group of CpGs where only 1 or 2 are significant and the rest not at all. In fact, the former type of DMR will be reported at the expense of the latter.

Does the smoothing function of DMRcate give me lots of false positives?

Great question, and the point at which the user has to make a judgement call. The post-smoothing per-CpG FDRs (the minimum of which is reported in minfdr in your results GRanges object) are much more permissive than those from limma, and so rather than set the default threshold statically on these, the recommended default (pcutoff = "fdr" in dmrcate()) dynamically adjusts the final threshold to set the number of constituent CpGs to be the same as the number of DMPs found by limma at that FDR (as I alluded to in the first paragraph). This leads to a situation where there are, in-effect, two equally-sized lists of CpGs that are (most likely) non-identical. DMRs are then aggregated from the post-smoothed list, not from the DMPs. This is an inherently conservative approach, since all CpGs are assumed to be independent (even though we know they are not), so false positives shouldn't be a concern if you are using the default settings. However, if you feel this is too conservative, please relax the fdr argument in cpg.annotate() to your liking. To offset this, you can perhaps use the Stouffer value for each DMR to further refine your list, but again this is up to you.

Best, Tim

Thank you very much for your reply. This makes sense. I used the default settings for DMRcate. Do you consider all reported DMRs significant, or do you recommend using the Stouffer value (<0.05?) to select the significant DMRs?

Thanks again, Anne-Kristin

Hi Anne-Kristin,

Yes I would - at the rate at which you specified the fdr. Although I think summarising the significance of a DMR with one single value can be misleading, the Stouffer value is there to do this to placate the people who want one. I see it as an extra level of control if you're still concerned about false positives. But beware, if one constituent CpG has an FDR of 1, this, means the whole DMR will have this value as well. An upcoming version of DMRcate will have Fisher's combined summary and the harmonic mean of the FDRs as extra options to choose from.