Hi, I'm using PureCN for classifying somatic SNPs from germline ones without matched normal.
I tested it for validation with 42 gastric cancer bam files without matched normal.
They are target sequenced and its covered bed reaches 6.1Mb in all.
I set muTect paired-called "SOMATIC" tagged-SNPs as golden standard and compared it with PureCN results with pooled normal.
I prepared NormalDB with 42 normal bams and vcfs according to provided manual. (each 42 normal is individually matched with each tumor sample)
And I ran PureCN and took 42 results. Their variants.csv files have the somatic prediction for each SNP, but I want to improve sensitivity and fidelity of prediction.
I counted somatic-predicted call as ML.SOMATIC as "TRUE", FLAGGED as "FALSE", prior.somatic > 0.1
I attached the aggregate of curation files that some columns are added to default ones.
"Normal-matched Somatic Call" is counts of matched-normal muTect called SNPs.
"Predicted Somatic Call" is counts of calls predicted by PureCN.
"True Somatic Call" counts of intersect between above two counts.
"True/Predicted" is the value which is "True Somatic Call" divided by "Predicted Somatic Call"
How can I go forward?