I'm trying to perform an analysis similar to those in ENCODE or FANTOM publications where enrichment of GWAS SNPs in regulatory regions (e.g.: DHSs, CAGE-defined enhancers) is calculated [1,2].
So, I would like to calculate if a set of GWAS SNPs associated with a disease of interest is enriched in my set or regulatory regions compared to a background distribution of SNPs (i.e.: the 1000 Genomes data).
How am I supposed to set up my contingency table for Fisher's exact test?
My guess would be something like:
|Number of GWAS SNPs in regulatory regions||Number of GWAS SNPs in regulatory regions|
|Total number of GWAS SNPs||Total number of 1000 Genomes SNPs|
And then simply use the
fisher.test() function on the matrix.
There's also the fact that the GWAS SNPs are a subset of the 1000 Genomes SNPs: should I subtract them from the superset before performing the test?
 Maurano et al., 2012: https://www.ncbi.nlm.nih.gov/pubmed/22955828
 Andersson et al., 2014: https://www.ncbi.nlm.nih.gov/pubmed/24670763