I have some ATAC-seq data from a published paper. They come in the form of BED files, a result of MACS peak detection. I only have region coordinates, no p-values, no enrichment scores. There are two biological conditions and 8 replicates each. I've been tasked with finding transcription binding sites for a small set of 5 genes and comparing these sites between the conditions.
My approach to finding TFBS would be to:
- find consensus peaks, e.g., overlap present in at least 3 replicates in each condition,
- identify promoter regions of the genes (e.g. -2 kbp / +500 bp)
- find the overlap between promoter regions and the consensus peaks
- extract sequences of the selected peaks,
- get frequency matrices from JASPAR
- use
motifmatchr::matchMotifs
to match motifs
However, I'm not sure how to do differential motif analysis, that is how to find TFBS present in one condition but not the other, with some statistical foundation. I'd appreciate any suggestions on how to proceed.