Hello,
I'm trying to perform an analysis similar to those in ENCODE or FANTOM publications where enrichment of GWAS SNPs in regulatory regions (e.g.: DHSs, CAGE-defined enhancers) is calculated [1,2].
So, I would like to calculate if a set of GWAS SNPs associated with a disease of interest is enriched in my set or regulatory regions compared to a background distribution of SNPs (i.e.: the 1000 Genomes data).
How am I supposed to set up my contingency table for Fisher's exact test?
My guess would be something like:
Number of GWAS SNPs in regulatory regions | Number of GWAS SNPs in regulatory regions |
---|---|
Total number of GWAS SNPs | Total number of 1000 Genomes SNPs |
And then simply use the fisher.test()
function on the matrix.
There's also the fact that the GWAS SNPs are a subset of the 1000 Genomes SNPs: should I subtract them from the superset before performing the test?
Thanks!
[1] Maurano et al., 2012: https://www.ncbi.nlm.nih.gov/pubmed/22955828
[2] Andersson et al., 2014: https://www.ncbi.nlm.nih.gov/pubmed/24670763