Question

statistical tests to show the specificity of a phenomenon (eg increase in H3K27me3 mark)

0

Entering edit mode

Bogdan ▴ 670

@bogdan-2367

Last seen 6 months ago

Palo Alto, CA, USA

Dear all,

although this may not be a question specifically for BioC, thought that I can still post it (if you do not mind), shall any packages for ChIP-seq analysis/statistical analysis be available to address it.

the question regards the statistical tests to show the specificity of phenomenon : let's consider an example - someone did a ChIP_seq for H3K27me3, and wants to show that a histone mark (eg H3K27me3 mark) increases on the genes involved in a particular biological process (eg 300 autophagy-related genes , from a total of 1000 genes with increased H3K27me3) after cell treatment .

what type of analysis would you recommend in order to show that the phenomenon (ie increase in H3K27me3) is specific to a set of genes (ie autophagy genes) :

-- taking random sets of non-autophagy genes (practically, the rest of the genes in the genome) -- and using parametric and non-parametric tests when comparing SET 1 (autophagy genes) with SET 2 (non-autophagy genes)

or

-- using hypergeometric / fisher-tests on a matrix (autophagy/no-autophagy genes vs increase/no-increase in H3K27me3) ?

thanks a lot, and happy weekend ;) !

bogdan

chip-seq • 892 views

ADD COMMENT • link updated 7.2 years ago by Wolfgang Huber ★ 13k • written 7.2 years ago by Bogdan ▴ 670

score 3 · Accepted Answer · 2017-02-26

Bogdan

The main point is: don't use a test, or the language and concepts of testing here. Rejecting a null hypothesis of non-specificity is near to uninformative (boring, besides the point, ridiculous, ...) with regard to the strength of specificity, since such a hypothesis test would confound effect size and sample size.

Instead, choose a reasonable quantitative summary statistic (e.g. odds-ratio, or other measures of enrichment) and in addition to its point estimate, get information about the associated distribution or confidence region by resampling, e.g,. bootstrap. The choice of which summary statistic to use is less a statistical question but a biological one, and presumably you can consider several.

Wolfgang