I'm new to Bioconductor and Biostrings and would greatly appreciate any help with the task I'm trying to complete. I'd like to identify potential transcription factor binding sites for a few transcription factors using a position weight matrix representing the TF's binding motifs. I want to do this analysis on genes in the human genome, but only consider 1000 bp upstream and 500 bp downstream of the transcription start sites (TSS).
I've been trying to use the matchPWM function but am not sure how to curate a genome dataset so that it only contains the sequence strings I'm interested in. I installed and loaded the TxDb.Hsapiens.UCSC.hg19.knownGene because I think that's a genome database that's getting at what I'm interested in, but am not sure how to proceed from here to narrow the database down.