Question

PureCN can classify indel variants as germline vs. somatic ?

0

Entering edit mode

pathkim • 0

@pathkim-9889

Last seen 3.8 years ago

South Korea

Dear administrator;

I found this comment that the software currently does not officially support VCF files containing indels. Support for VCFs generated by MuTect 2 that include both single nucleotide variants (SNVs) and indels is planned for Bioconductor 3.5.

Now, the version of bioconductor is 3.5.

So can I classify indel variants as germline vs. somatic with PureCN?

Thanks for your help.

PureCN indel somatic germline • 1.8k views

ADD COMMENT • link updated 8.7 years ago by markus.riester ▴ 130 • written 8.7 years ago by pathkim • 0

score 1 · Answer 1 · 2017-06-08

1

Entering edit mode

markus.riester ▴ 130

@markusriester-9875

Last seen 3.6 years ago

United States

Hi,

thanks for your interest in PureCN. GATK4 beta will be released in the next weeks and you can expect a fairly well tested PureCN version soon after. GATK4 alpha was only available under an academic license and I couldn't add this frequently requested feature in time for 3.5. Shoot me an email if you want to get notified when the new PureCN version is available.

Markus

ADD COMMENT • link 8.7 years ago markus.riester ▴ 130

0

Entering edit mode

Dear Markus;

Thanks for your kind reply.

I also found this comment that samples with tumor purities below 20% usually cannot be analyzed with this algorithm.

It can be improved in a new PureCN version?

JOHEON KIM

ADD REPLY • link 8.7 years ago pathkim • 0

1

Entering edit mode

Good question. Germline vs. somatic classification below 30-35% purity is easy and you can expect a 99+% accuracy below 20% - you wouldn't even need PureCN for that.

The 20% number refers to purity/ploidy inference. The actual lower limit depends on coverage and quality of the data. In high coverage, high quality data from highly optimized assays with a sufficiently large pool of normal samples (for which the tool was designed for, see the vignette for details), this can be as low as 15%. Poor quality FFPE data can be so noisy that even 35% purity is challenging. Dramatic amplifications are usually detectable in high coverage samples with around 15%. In clean data, usually 3-4 exon high level amplifications are detectable, in noisy data even 6-10 exons can be challenging.

If the purity is very low, like below 2-3%, then the algorithm might start fitting mainly noise because there simply is not a lot of signal. The returned purity can be pretty random in those cases. These cases are usually obvious in a manual curation.

PureCN is designed for hybrid capture data of mostly exons and can therefore only use coverage (i.e in cannot use split reads etc.). So there is not a lot we can do algorithmically - most of the recent efforts are related to cleaning up the data optimally and using all data as efficiently as possible. Mostly using pool of normal samples.

ADD REPLY • link 8.7 years ago markus.riester ▴ 130