Hi,
Background:
I am analyzing TF proximity mutations in ICGC cancer cohort and I would like to prove that upon TF binding mutation signature profile changes within binding regions.
When I analysed TF proximity mutation with mutations signature analysis packages (such as SomaticSignature). It gives the "the famous" signature graph of 96 trip nucleotide proportions. Within my TF, I see that T>A mutations are enriched. However, I also see that the sequence context of A,T bases in the binding region sequence context. I think I need to prove that this two events (enrichment is independent of sequence context) are independent.
Questions:
1) I have seen that mutation signature analysis are mostly done in the tumor wise cases instead of certain areas. Could you enlighten me, if there is kind of normalisation done when this analysis applied for specific regions ?
2) What statistical test should I use to prove Total Cancer Mutations vs TF proximity mutations are different? (like Kullback-Leibler divergence ?) But I couldnt incorparete the sequence context.
3) Referring to the posts where KLD is suggested as one of the methods to show similarity between motifs. Assume I successfully find my motifs, which specific output of KLD object am I interested for further interpretation ?
edit: I understood KLD interpretation with couple wiki reading. Please dont bother yourself to explain that part.
(Dear Julian if you are there, sorry for asking same question again. But I really need this answer)
Thank you very much for your help,
Best,
Tunc.
I have trouble understanding the questions you are trying to answer. Could you please provide more context and, if possible, also some results that would help in understanding it better? Specifically, what do you mean with "tumor wise cases instead of certain areas" in question 1? Also, are you interesting in differences in mutations or mutational signatures in question 2?
First of all, thank you for answering and spending time. We are working on breast cancer WGS data and analysing oestrogen receptor binding proximity mutations. (We say there is an enrichment of mutation frequency on these regions with compared to the whole genome)
1)Specifically, what do you mean with "tumor wise cases instead of certain areas" in question 1?
- Let me rephrase it, I see that most of the studies do mutational signature analysis on whole genome mutation data. But in my case. Therefore, what I meeant by "certain areas" was to say analysing TF binding regions' mutation signature analysis.
2) are you interesting in differences in mutations or mutational signatures in question 2?
- my only aim is to say mutational_signature(WholeGenomeBreastCancer) != mutation_signature(ERbindingRegions)
These are the results after I run the normalisation that I asked in your github issue #4
http://imgur.com/a/dViID —> Mutation signature Contribution
http://imgur.com/a/U3MMW —> Fitted Signature
http://imgur.com/a/yW8se —> Observed
Any updates ?
The answers in this support forum are largely contributed by volunteers that do this during their free and personal time. Pushing for answers after five days, especially during the Easter weekend where many people have other commitments, is neither helpful nor encouraging.
( I just saw your comment after a long time)
Well I think answering questions of the tool that you have created is one of your greatest responsibility. Because I am putting the tool that you coded ( which I truly appreciate your work and without your work I wouldn't have done this analysis.) in the centre of my project. Regarding this, I require your help for the points which I think is not very clear. On top of everything, I think every single question could give you new ideas and opportunities to expand your work.
In addition, if you can see my comments and answers, I start all of my questions with pointing my gratitude your effort and work so I dont know what could I say to you make you happy. Therefore, what you have stated is not supposed to be relevant with my questions.