Hi,
I'm currently using SomaticSignatures package to extract signatures from my NGS datas.
I wanted to compare it to validated mutational signatures published by Alexandrov et al. (ALEXANDROV, Ludmil B., NIK-ZAINAL, Serena, WEDGE, David C., et al. Signatures of mutational processes in human cancer. Nature, 2013.) , to evaluate the implication of each validated profile in my datas.
I got their profiles on their server (ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl) and included them in the matrix obtained from mutationContextMatrix function (= sca_occurence in tutorial).
Then I applied the 3 statistical methods (nmf, pca and kmeans). But the tool pools and calculates again all the mutational profiles.
Is there a method to fix validated profiles and just having an estimation of their implication in my datas?
Thanks
I've just updated to v2 along with BioC v3 and thought this is a superb package. I would like to analyse my data using the new "21 signatures" data but could you confirm the only way of doing so is using them as described by Muller in his original question please?
Best,
Dave
The approach described in the original question is not really a way to do it at all, and you should not use for this purpose. Estimating the existing of already signatures requires a different approach. As I wrote in the answer to Etienne's question, this will be available in the future.
is the comparison available in the current release 2.4.5?
A comparison to published signatures should be fairly easy if one has defined a suitable measure for comparing the identified signatures. If we go with your example of the KLD, we can compare the matrix we get from
samples(sigs)
to the 21 signatures published first on this. They are included in the package and you can access them withdata(signatures21)
.Do you mean to compare signatures(sigs) to data(signatures21) because both will return mutation motifs (e.g. CA A.A) as rows and signatures as columns?
Yes, both matrices have the same structure and one can e.g. use a reasonable distance measure to compare the two.
Any update on this? I'd like to compare my signatures with the 30 COSMIC signatures found here:
http://cancer.sanger.ac.uk/cosmic/signatures
When I use SomaticSignatures on my dataset, I see a difference between the signatures determined with the methods of the package and the signatures from Alexandrov: the sum of each Alexandrov signature is 1 but the sum of each determined signature is between 18 and 22 (sum(as.numeric(sigs_nmf@signatures[,1]))).
It is probably not a good idea to compare such different vectors.
Is it normal to get sum beyond 1? Should I normalize the vectors by their sum?
Thank you for your help