Question

In the sesame package, what does the estimate output mean from testEnrichment()

0

Entering edit mode

ramiro.barrantes ▴ 10

@ramirobarrantes-7796

Last seen 6 months ago

United States

Hello, I have a set of patients tumors with and without a particular condition. I am using the sesame package to investigate their methylation pattern. So far it's been really helpful but there is something that I don't quite understand and something that I don't know how to do.

I would like to compare the methylation pattern at the gene level with and without the condition, for this I do a differential methylation analysis, for example:

se <- SummarizedExperiment(betas, colData = metaData)
summary = DML(se, ~condition, BPPARAM = BiocParallel::MulticoreParam(4))
test_result = summaryExtractTest(summary)

And now I would like to look at what happened at the gene level, one way is to do something like this

df <- testResult[testResult$Pval_Condition < 0.01 & abs(testResult$Est_Condition) > 0.1,]
result <- testEnrichment(df$Probe_ID, KYCG_buildGeneDBs(df$Probe_ID, max_distance=100000, platform="EPIC"),platform="EPIC")

But then I get something like this:

> estimate      p.value log10.p.value     test   nQ  nD overlap  cf_Jaccard cf_overlap
> 3.284153 7.438132e-16     -15.12854 Log2(OR) 5618 403      24 0.004002001 0.05955335

>cf_NPMI cf_SorensenDice          FDR                   group             dbname
>0.2113224     0.007972098 7.138797e-12 KYCG.EPIC.gene.00000000  ENSG00000278341.1

>gene_name
>AC138028.6

The first question I have is, what is "estimate" and what is "overlap"? Is the former something like the effect size? Is the second the number of probes? I don't seem to find this information anywhere.

The second question is that I would like to do a heatmap with some of these genes and would like to color them by something akin their difference in methylation (this is what one would do with differential expression analysis for example), would it be valid to use the "estimate" above for something like this?

MethylationArray methylation sesame methylationArrayAnalysis • 1.3k views

ADD COMMENT • link 16 months ago ramiro.barrantes ▴ 10

score 1 · Answer 1 · 2024-09-12

It's right there in the output (Log2(OR)). It's doing a Fisher's exact test, which returns an odds ratio, and the output is the log odds ratio. So the estimate is the log odds ratio. I see how the others are confusing. They are, in order

nQ = # significant probes
nD = # of probes in the term
overlap = # of significant probes that are also in the term

Note that nQ is simply repeated in the table that you extracted that row from, because the number of significant probes is a constant for a given comparison.

There is a supplemental vignette that shows how to plot things.

Also, you should note that filtering your results using p-values and logFC is suboptimal. The p-value is based on the null hypothesis that the fold change is zero, and adding in an extra criterion like logFC > X invalidates the meaning of the p-value. There are ways to incorporate a threshold value in your statistic (for example using the treat function in limma), and you might be able to use that with sesame, but you would have to explore that possibility yourself.