Hello,

I'm trying to perform an analysis similar to those in ENCODE or FANTOM publications where enrichment of GWAS SNPs in regulatory regions (e.g.: DHSs, CAGE-defined enhancers) is calculated [1,2].

So, I would like to calculate if a set of GWAS SNPs associated with a disease of interest is enriched in my set or regulatory regions compared to a background distribution of SNPs (i.e.: the 1000 Genomes data).

**How am I supposed to set up my contingency table for Fisher's exact test?**

My guess would be something like:

Number of GWAS SNPs in regulatory regions | Number of GWAS SNPs in regulatory regions |
---|---|

Total number of GWAS SNPs |
Total number of 1000 Genomes SNPs |

And then simply use the `fisher.test()`

function on the matrix.

There's also the fact that the GWAS SNPs are a subset of the 1000 Genomes SNPs: should I subtract them from the superset before performing the test?

Thanks!

[1] Maurano et al., 2012: https://www.ncbi.nlm.nih.gov/pubmed/22955828

[2] Andersson et al., 2014: https://www.ncbi.nlm.nih.gov/pubmed/24670763