Question: methylPipe: how BSprepare handles zeros
8 months ago
jonathan.moore wrote:

I am looking at making mCG calls at single CpG sites, and have been looking at methylPipe for this reason.  I have a query about the functioning of BSprepare.

When BSprepare loads up the #C/#T values at a locus, if coverage<50 at a particular nucleotide, then it seems that it looks up the p-value of the site being methylated from a lookup table, which has been populated from binomial tests of #C+#T values from 1-50, and a supplied error probability.

The table has no entries for #C=0, and nor does BSprepare run a separate binomial test for #C=0 if #T<=50, instead returning NA.

Is it an intentional design feature that BSprepare returns NA as the p-value for cases where #C=0 and 0<=#T<50?  If so, is there a sound biological or statistical basis for this?

Many thanks for any help

Jay Moore

8 months ago
mattia pelizzola wrote:

Hi Jay,

we considered that in the absence of #C there is no evidence supporting the presence of an mC call at that position. Thus, we reasoned to be useless to test for it. You can consider this as a pre-processing filter, such as avoiding testing for differential expression for those genes that have either zero o very low expression.




Hi Mattia,

That makes great sense, thank you. It sounds like a useful performance optimisation.

Best wishes


jonathan.moore
