27 days ago by
CRUK, Cambridge, UK
We currently don't have reliable methods for mapping between ChIP-seq peaks and the genes they regulate (except possibly if they happen to binding inside of an annotated promoter). Some of the peaks may be non-functional, and we don't know which of the "nearby" genes inter-genic binding sites regulate with more data. Even with corresponding RNA-seq data, this is difficult to untangle.
Basically, we hypothesize that a set of genes that are "near" differentially bound sites are enriched for genes regulated by the binding factor. That is, they are more likely than a set of background genes to change their transcription levels. You can verify that by comparing the distributions of read counts in those genes in the two sample groups (you can even test if they "significantly" differ). You should include the genes that are in both sets (the 537 overlapping) in both of these distributions; the sets are not meant to be definitive, only to show enrichment for differential regulation.
The gene sets can be further refined by looking at differential expression, and the direction of the fold change. So you can consider the set of gens that are proximal to differential binding sites and exhibit differential expression. You can refine those further by including genes that gain expression in the sample with increased binding (sign of fold change is the same).