Question

Results with PON give more variants

0

Entering edit mode

ramiro.barrantes ▴ 10

@ramirobarrantes-7796

Last seen 4 months ago

United States

Recently we created our own PON with 5 normals. We ran mutect two ways:

Mutect2withPON: Mutect2 with the PON, 545 variants
Mutect2withoutPON: Mutect2 without the PON, 540 variants

We were assuming, perhaps naively, that Mutect2withPON would be contained in Mutect2withoutPON, and the ones that were missing we would consider results of artifacts that were filtered by the PON.

Indeed, Mutect2withoutPON has 4 variants that Mutect2withPON does not have, presumably those come from artifacts present in the PON. However, Mutect2withPON has 9 variants that Mutect2withoutPON does not have. Does this make sense or are we missing something? Why is this?

mutect2 variants pon • 249 views

ADD COMMENT • link updated 2 hours ago by Kevin Blighe ★ 4.0k • written 4 months ago by ramiro.barrantes ▴ 10

score 0 · Answer 1 · 2025-11-21

Your observation aligns with the expected behavior of Mutect2 in tumor-only mode, where the panel of normals (PON) integrates into the variant calling model rather than functioning solely as a post-calling filter. The PON provides site-specific estimates of artifact probabilities based on alternative allele counts observed across your five normal samples. This informs the somatic likelihood calculations during the active region determination and local assembly steps.

Without a PON, Mutect2 relies on a default uniform prior for artifact probabilities (typically around 10^{-6} per base). This can lead to conservative calling at certain sites, where borderline evidence in the tumor sample fails to exceed the emission threshold. In contrast, when using the PON, sites with minimal or no alternative alleles in the normals receive a lower artifact prior, enabling Mutect2 to call variants with lower allele fractions or weaker support that might otherwise be dismissed under the default model.

The nine additional variants in your Mutect2withPON results likely represent such cases -- true somatic events or low-level signals that become detectable due to the refined artifact modeling. Conversely, the four variants unique to Mutect2withoutPON may reflect artifacts that the PON correctly downweights, preventing their emission.

This discrepancy arises because Mutect2 does not simply subtract PON variants; instead, the PON modulates the probabilistic framework, potentially expanding the call set at low-artifact sites while suppressing it at high-artifact ones. Your assumption of strict containment is incorrect for this reason. To investigate further, compare the allele depths and qualities of the discordant variants using commands like:

bcftools view -r <region> <vcf_file> | grep <variant_position>

or annotate with tools such as VEP for functional insights.

Kevin