I am analyzing TF proximity mutations in ICGC cancer cohort and I would like to prove that upon TF binding mutation signature profile changes within binding regions.
When I analysed TF proximity mutation with mutations signature analysis packages (such as SomaticSignature). It gives the "the famous" signature graph of 96 trip nucleotide proportions. Within my TF, I see that T>A mutations are enriched. However, I also see that the sequence context of A,T bases in the binding region sequence context. I think I need to prove that this two events (enrichment is independent of sequence context) are independent.
1) I have seen that mutation signature analysis are mostly done in the tumor wise cases instead of certain areas. Could you enlighten me, if there is kind of normalisation done when this analysis applied for specific regions ?
2) What statistical test should I use to prove Total Cancer Mutations vs TF proximity mutations are different? (like Kullback-Leibler divergence ?) But I couldnt incorparete the sequence context.
3) Referring to the posts where KLD is suggested as one of the methods to show similarity between motifs.
Assume I successfully find my motifs, which specific output of KLD object am I interested for further interpretation ?
edit: I understood KLD interpretation with couple wiki reading. Please dont bother yourself to explain that part.
(Dear Julian if you are there, sorry for asking same question again. But I really need this answer)
Thank you very much for your help,