I have a naive understanding of SomaticSignature package although I've worked out getting an output and need a bit of help with comparing my results.
Briefly, we have ~80 samples under 4 conditions (groups) and have inferred 7 signatures for these for which we are trying to find out the specific signature per group. Normalisation was done by creating probability score in each sample using the function,
I'm trying to compare the our inferred signatures from SomaticSignatures with its data(signature21) as well as the latest 30 signatures data(signature30) from http://cancer.sanger.ac.uk/cancergenome/assets/signatures_probabilities.txt (again, im not sure
if these values can be used directly like data(signature21) or need further processing, but they seem to have the same range in values i.e,
# data(signature21) : 21 Somatic signatures range : 0.0000 0.4246
# data(signature30) : 30 Somatic signatures range : 0.0000000 0.4199414
Not much correlation between 21 Somatic signatures & 30 Somatic signatures
- Seem that the correlation between data(signature21) and data(signature30) seem to be quite different although our results seem to concur with data(signature21) more. It seems like S18 from data(signature21) is more correlated with Signature8 or Signature3 from data(signature30). Could the names of signatures have changed in the latest update?
- Not sure what I understand by S1A,S1B,SR1,SR2,SR3,SU1,SU2 from data(signature21) and what they correspond to in the updated version.
Comparing published signatures
I've tried 2 things here, yet to figure out which is the better approach.
cor(x, y = NULL, use = "everything",method = c("pearson", "kendall", "spearman"))
b)Using cosine similarity
cosine(x, y = NULL)
We normalised data by creating probability score in each sample i.e,
Assessed the number of signatures to be n=7
Note: Showing only the correlation output here [cosine similarity is similar for the data(signature21) matrix but not data(signature30)]
Example output of Correlation between 21 Somatic signatures.
Highest correlation with our S1 : Signature- S18,0.761870210035029
Highest correlation with our S2 : Signature- S18,0.895449441547398
Highest correlation with our S3 : Signature- S18,0.8176733925205
Highest correlation with our S4 : Signature- S18,0.66299142221862
Highest correlation with our S5 : Signature- S1B, 0.613441745488655
Highest correlation with our S6 : Signature- S1B, 0.665963561771328
Highest correlation with our S7 : Signature- S5, 0.472316222871848
Example output of Correlation between 30 Somatic signatures.
Highest correlation with our S1 : Signature.30 0.250721518256121
Highest correlation with our S2 : Signature.8 0.230023105084011
Highest correlation with our S3 : Signature.8 0.256823544093413
Highest correlation with our S4 : Signature.25 0.331866927022225
Highest correlation with our S5 : Signature.27 0.353397110264527
Highest correlation with our S6 : Signature.28 0.255187187591234
Highest correlation with our S7 : Signature.25 0.312821309032734
Any idea why the output is different when using data(signature21) and data(signature30)
- Also, would using somatic spectrum motifMatrix values (sca_mm, from package example) to compare published signatures and spectrum of our individual samples make sense?
- On another note, when using functions plotObservedSpectrum, plotFittedSpectrum I get an error " n too large, allowed maximum for palette Set3 is 12”
- I think this is because the package is limited to 12 colours so i don't get an output with the remaining samples.
Do let me know if the above requires clarification.
Looking forward to your comments.