I am looking to analyze my histone modification ChIP-seq data in promoter regions using Repitools, but I am running into some issues with how to properly define promoter regions for genes with multiple isoforms with different transcription start sites. I'm working with the human hg38 Ensembl transcript annotations. Suppose a particular gene has 3 isoforms, A, B, and C. Isoforms A and B have the same TSS and differ only in the splicing of exons farther downstream. Isoform C has a different TSS. The point of Repitools is to characterize ChIP-seq coverage at different offsets relative to the position of the gene's TSS, but this gene has more than one TSS. I can think of a few possible ways to resolve this issue, all of which seem to have some merit:
- Handle each isoform independently, generating a 3 separate promoter coverage profiles for this gene, based on the offset from each isoform's TSS. (The profiles for isoforms A and B will be identical, of course.)
- Handle each unique TSS independently as above, generating 2 separate coverage profiles, one for isoforms A&B and the other for isoform C.
- Choose one TSS as the "representative" TSS for this gene and ignore the others, generating a single coverage profile for the gene. This might be the farthest upstream TSS, the TSS with the most nearby ChIP-seq reads, the TSS of the highest-expressed isoform, or based on some other criterion.
Can anyone provide some guidance as to whether one of these is the most correct or reasonable way to handle things, or whether each one is potentially valid depending on what I want to achieve? Is there a 4th alternative that I haven't thought of that works better than any of the 3 I've suggested here? Does the answer change if the distance between TSS positions is 10 bp or 1 kbp or 10 kbp?
In case it matters, I also have RNA-seq data for the same samples, but the coverage is only deep enough to measure gene expression, not individual isoform expression.