I am looking at making mCG calls at single CpG sites, and have been looking at methylPipe for this reason. I have a query about the functioning of BSprepare.
When BSprepare loads up the #C/#T values at a locus, if coverage<50 at a particular nucleotide, then it seems that it looks up the p-value of the site being methylated from a lookup table, which has been populated from binomial tests of #C+#T values from 1-50, and a supplied error probability.
The table has no entries for #C=0, and nor does BSprepare run a separate binomial test for #C=0 if #T<=50, instead returning NA.
Is it an intentional design feature that BSprepare returns NA as the p-value for cases where #C=0 and 0<=#T<50? If so, is there a sound biological or statistical basis for this?
Many thanks for any help
Jay Moore
Hi Mattia,
That makes great sense, thank you. It sounds like a useful performance optimisation.
Best wishes
Jay