Question

No CNAs or SNVs in results

0

Entering edit mode

twtoal ▴ 10

@twtoal-15473

Last seen 18 months ago

United States

Following up on your last email about my PureCN run:

What told you there were no somatic events?

My VCF file marks somatic mutations with INFO field SOMATIC. Also, the normal ID has genotype == 0/0 while tumor ID does not. Here is a snippet from my VCF:

chr1    16576328        .       G       C       227.77  PASS    .       GT:AD   0/1:169,11      0/1:317,51
chr1    16576389        .       T       C       1676.77 PASS    .       GT:AD   0/1:164,44      0/1:255,92
chr1    16576399        .       A       G       6939.77 PASS    .       GT:AD   0/1:44,160      0/1:65,271
chr1    16576420        .       G       A       .       PASS    SOMATIC GT:AD   0/0:188,15      0/1:274,27
chr1    16576487        .       G       T       0       PASS    .       GT:AD   0/1:178,8       0/0:311,8
chr1    16576655        .       T       C       1501.77 PASS    .       GT:AD   0/1:0,22        1/1:0,41
chr1    16577156        .       C       A       .       PASS    SOMATIC GT:AD   0/0:17,0        0/1:31,4
chr1    16577179        .       C       T       0       PASS    .       GT:AD   0/1:28,2        0/0:54,0

That sample should definitely have some CNAs, I saw some with my simple CNV code. I saw some CNVs with every one of our samples.

Yes, I used PureCN as in quick vignette. Log file is at https://www.dropbox.com/s/t0ck3yhj3q43rmp/IBG-2-CG-178T3.log?dl=0

I estimated purity by looking for the highest VAF of a heterozygous somatic SNV, and doubled that. I actually tried to use the top few in VAF and averaged the VAF. Also, I used a loop that repeatedly estimated purity and copy number, and when it was estimating purity, it used only SNVs in regions estimated to have copy number 2.

Ted Toal, Postdoctoral Researcher, Carvajal-Carmona Lab

On Apr 24, 2018, at 4:44 PM, Riester, Markus <markus.riester@novartis.com> wrote:

Reptiming is pretty minor. I mostly added it because it can find samples with high proliferation. But to be honest I haven’t really used this so far.

Yes, something is fishy, there are no somatic events at all, CNAs or SNVs.

The algorithm needs at least a few CNAs to work properly. See the ABSOLUTE paper. It can work in MSI high samples where SNVs provide sufficient signal.

How do you mark somatic mutations in your VCF?

How do you estimate 40% purity? Based on hotspots?

You used PureCN.R as in the Quick vignette? Feel free to send the log file.

Mutect 1 is super-fast for panels, I would still recommend doing the initial tests with 1.1.7 as described in the vignettes. This gives you a well-tested pipeline to start from.

On Apr 24, 2018, at 3:01 PM, Toal, Ted <twtoal@ucdavis.edu> wrote:

I finally got my first run finished (one sample only so far), and am going through the data looking at it.

Question: I did not use a replication timing file. Do you advise that I do?

The local optima plot looks worrisome. #1 solution way down at very low purity. 2,3,4 near ploidy 4. See below. How do you pick the optima peaks? It seems to me, looking at the plot, that you could say the best peak is around purity 0.4 ploidy 2.

I went through your list of items to check, and I think everything is pretty solid here. But I’m suspecting there may be some parameter that needs tweaking.

The copy number log ratio histogram doesn’t look good. Looks shifted. See below.

My own estimate of purity for this sample was 0.42, but that is very iffy.

Any suggestions on what particularly to look at?

PureCN • 1.5k views

ADD COMMENT • link 7.7 years ago twtoal ▴ 10

0

Entering edit mode

Thanks Ted, I cannot access the log file, feel free to send by mail. To pick a good test sample with clear CNAs, I would plot germline allelic-fractions and then chose samples with lots of whole arm gains and losses, indicated by strong allelic balance (like in the vignette).

ADD REPLY • link 7.7 years ago markus.riester ▴ 130

score 0 · Answer 1 · 2018-04-25

Markus, you wrote to me privately:

> You have this warning here: WARN [2018-04-24 12:26:24] Sampleid looks like a normal in VCF, not like a tumor.
> What this means is that PureCN sees more 0/0 or 0 in the sample that’s supposed to be tumor, not normal. Sure there is not a tumor/normal sample swap?
> See the function .getTumorIdInVcf that tries to figure out what is tumor and what is normal.
> A high quality sample should have a log-ratio standard deviation < 0.25. > 0.5 is very noisy, mostly because of the low coverage…

In my coverage data, I see that this sample (178T3) that I randomly chose as first one to run PureCN on, happens to be one of two samples with very low coverage in the tumor. Also, I checked my own CNV ad hoc method's output data, and by more pure chance it turns out that that sample was also one of the few samples with NO detected CNVs. So, this was a terrible first sample to run, I'll switch to a better one.

But following up more on this sample's VCF file, it has the following counts of genotypes in its variants (left of “-" is normal, right of “-" is tumor):

    116 0/0-0/1
    370 0/1-./.
    608 0/1-0/0
   2130 0/1-0/1
    987 0/1-1/1

Does PureCN filter out ./.? Or perhaps I should remove any variant with ./. genotype in either normal or tumor from the VCF?

The 608 that are 0/1-0/0 are a problem. It looks like most have very low VAF in the normal (because I called these with Mutect2 for the panel of normals, and it uses greater sensitivity than a normal germline caller would). Is the VAF limit option for the normal being applied when you do the test to see if the tumor sample looks like a normal, or should I remove germline variants with low VAF from the VCF myself? Maybe I should also filter out those without at least ~ 5 reads of support for the alternate allele.

In many cases, although the tumor has genotype 0/0, it does have a few alternate reads, just not enough for it to be called 0/1. Maybe I should just remove variants that are called in the normal but have 0/0 in the tumor, as probable artifacts?

In the one case I checked, a problematic variant like this was present in the panel of normals. Do you remove panel-of-normals variants from the VCF file variants before checking normal ID vs tumor ID?

It is my understanding I should leave the PON variants in the VCF file, is that correct?

You sent the following comment on the above:

> You want to filter out artifacts, but keep real germline variants. So everything that is not a known SNP but present in the PON, say 2-3 times, should go.