Question

Bias correction for enrichment analysis of 450k methylation data => feed bias adjusted genes into IPA

0

Entering edit mode

Ed Schwalbe ▴ 10

@ed-schwalbe-6024

Last seen 9.2 years ago

United Kingdom

It has been reported that one of the problems of looking for enrichment of functional themes using 450k data is the bias introduced by the differing number of probes per gene. (See Geeleher et al, http://www.ncbi.nlm.nih.gov/pubmed/23732277)

The paper linked above describes correcting for this bias by adopting the probability-weighting function (pwf) in the package goseq, that is more usually used to correct for gene-length bias in RNA-seq data.

It is straightforward to use this probability weighting function to integrate with downstream gene ontology / KEGG enrichment analysis, but it is not obvious to me how you can adjust the p values from a 450k analysis of different groups using this method and then apply to an external process like Ingenuity Pathway Analysis.

Is there a code snippet that I could use or could help me understand how to use the pwf to adjust 450k p values in this way?

Thanks,

Ed

goseq 450k enrichment • 2.3k views

ADD COMMENT • link updated 9.2 years ago by belinda.phipson ▴ 40 • written 9.2 years ago by Ed Schwalbe ▴ 10

score 1 · Answer 1 · 2015-02-23

Hi Ed,

Do you mean that you want to use the output from the pwf function to adjust the p-values obtained from a differential methylation analysis? This is circular logic in that you need to first perform the differential methylation analysis, impose some sort of cut-off criteria to specify which probes are significant (which is usually based on the p-values), which is then fed into the pwf function and used to estimate the probability of significant differential methylation given the number of CpGs per gene. To me, it wouldn't make sense to apply it back to the original p-values which are used to estimate the prior probabilities.

I don't use Ingenuity Pathway Analysis, so I am not entirely sure what it takes as input. I have written a function in the missMethyl package called gometh (in the development version of the package) that simply takes a list of significant CpGs and estimates the prior probabilities using the pwf function and outputs a dataframe with the GO categories and associated output (p-values, FDRs etc), taking into account the bias. However based on your question I don't think this is what you want to do.

My only other thought is that you be more selective in terms of how genes are chosen based on differential methylation of CpG sites. You might want to consider combining CpG site level p-values at the gene level using Sime's method (for example), which is thought to be robust to positively correlated statistics. Imposing a multiple testing adjustment within genes would force the p-values for a gene with lots of probes to be more heavily adjusted than a gene with only a handful of probes. However, I have not tested this out and I am not aware of any R functions to do this specifically for the 450K array.

Cheers,

Belinda