Following up on your last email about my PureCN run:
What told you there were no somatic events?
My VCF file marks somatic mutations with INFO field SOMATIC. Also, the normal ID has genotype == 0/0 while tumor ID does not. Here is a snippet from my VCF:
chr1 16576328 . G C 227.77 PASS . GT:AD 0/1:169,11 0/1:317,51 chr1 16576389 . T C 1676.77 PASS . GT:AD 0/1:164,44 0/1:255,92 chr1 16576399 . A G 6939.77 PASS . GT:AD 0/1:44,160 0/1:65,271 chr1 16576420 . G A . PASS SOMATIC GT:AD 0/0:188,15 0/1:274,27 chr1 16576487 . G T 0 PASS . GT:AD 0/1:178,8 0/0:311,8 chr1 16576655 . T C 1501.77 PASS . GT:AD 0/1:0,22 1/1:0,41 chr1 16577156 . C A . PASS SOMATIC GT:AD 0/0:17,0 0/1:31,4 chr1 16577179 . C T 0 PASS . GT:AD 0/1:28,2 0/0:54,0
That sample should definitely have some CNAs, I saw some with my simple CNV code. I saw some CNVs with every one of our samples.
Yes, I used PureCN as in quick vignette. Log file is at https://www.dropbox.com/s/t0ck3yhj3q43rmp/IBG-2-CG-178T3.log?dl=0
I estimated purity by looking for the highest VAF of a heterozygous somatic SNV, and doubled that. I actually tried to use the top few in VAF and averaged the VAF. Also, I used a loop that repeatedly estimated purity and copy number, and when it was estimating purity, it used only SNVs in regions estimated to have copy number 2.
Ted Toal, Postdoctoral Researcher, Carvajal-Carmona Lab
On Apr 24, 2018, at 4:44 PM, Riester, Markus <markus.riester@novartis.com> wrote:
Reptiming is pretty minor. I mostly added it because it can find samples with high proliferation. But to be honest I haven’t really used this so far.
Yes, something is fishy, there are no somatic events at all, CNAs or SNVs.
The algorithm needs at least a few CNAs to work properly. See the ABSOLUTE paper. It can work in MSI high samples where SNVs provide sufficient signal.
How do you mark somatic mutations in your VCF?
How do you estimate 40% purity? Based on hotspots?
You used PureCN.R as in the Quick vignette? Feel free to send the log file.
Mutect 1 is super-fast for panels, I would still recommend doing the initial tests with 1.1.7 as described in the vignettes. This gives you a well-tested pipeline to start from.
On Apr 24, 2018, at 3:01 PM, Toal, Ted <twtoal@ucdavis.edu> wrote:
I finally got my first run finished (one sample only so far), and am going through the data looking at it.
Question: I did not use a replication timing file. Do you advise that I do?
The local optima plot looks worrisome. #1 solution way down at very low purity. 2,3,4 near ploidy 4. See below. How do you pick the optima peaks? It seems to me, looking at the plot, that you could say the best peak is around purity 0.4 ploidy 2.
I went through your list of items to check, and I think everything is pretty solid here. But I’m suspecting there may be some parameter that needs tweaking.
The copy number log ratio histogram doesn’t look good. Looks shifted. See below.
My own estimate of purity for this sample was 0.42, but that is very iffy.
Any suggestions on what particularly to look at?
Thanks Ted, I cannot access the log file, feel free to send by mail. To pick a good test sample with clear CNAs, I would plot germline allelic-fractions and then chose samples with lots of whole arm gains and losses, indicated by strong allelic balance (like in the vignette).