I am using the Illumina Infinium HumanMethylation450 assay and want to do gene ontology testing using the gsameth function in the missMethyl package.
For the generated plots that show the bias resulting from the differing number of CpG probes sites per gene, what is the significance of the fit curve? Why isn't a best fit line used?
I'm not 100% sure I understand your question. The plotBias=TRUE produces a plot that shows the proportion of signficantly differentially methylated (DM) genes in bins of ~ 200 genes. The bins are allocated based on the numbers of CpGs annotated to each gene. So each point in the plot represents the proportion of DM genes out of the 200 genes assigned to that bin. The blue line is a lowess fit through the points, which is a robust fit which can take any shape (i.e. it is not constrained to be a straight line, or polynomial of order 2 etc).
It is meant to help you eyeball the relationship between numbers of CpGs associated with each gene, and the proportion of genes called "significant". The expectation is that as numbers of CpGs associated with a gene increases, the more likely you are to call the gene "significant", and hence you would expect the blue line to increase from left to right. If you find that the line looks flat, then it is not as important to account for the bias in the data, although it won't hurt to use prior.prob=TRUE as the relationship is empirically determined by the data.
It is not meant to have a significance measure associated with it, it is more to aid in understanding the bias in your data.