PureCN can classify indel variants as germline vs. somatic ?
Entering edit mode
pathkim • 0
Last seen 3.6 years ago

Dear administrator;

I found this comment that the software currently does not officially support VCF files containing indels. Support for VCFs generated by MuTect 2 that include both single nucleotide variants (SNVs) and indels is planned for Bioconductor 3.5. 

Now, the version of bioconductor is 3.5.

So can I classify indel variants as germline vs. somatic with PureCN?

Thanks for your help.

PureCN indel somatic germline • 484 views
Entering edit mode
Last seen 28 days ago


thanks for your interest in PureCN. GATK4 beta will be released in the next weeks and you can expect a fairly well tested PureCN version soon after. GATK4 alpha was only available under an academic license and I couldn't add this frequently requested feature in time for 3.5. Shoot me an email if you want to get notified when the new PureCN version is available.



Entering edit mode

Dear Markus;

Thanks for your kind reply.

I also found this comment that samples with tumor purities below 20% usually cannot be analyzed with this algorithm.

It can be improved in a new PureCN version?


Entering edit mode

Good question. Germline vs. somatic classification below 30-35% purity is easy and you can expect a 99+% accuracy below 20% - you wouldn't even need PureCN for that. 

The 20% number refers to purity/ploidy inference. The actual lower limit depends on coverage and quality of the data. In high coverage, high quality data from highly optimized assays with a sufficiently large pool of normal samples (for which the tool was designed for, see the vignette for details), this can be as low as 15%. Poor quality FFPE data can be so noisy that even 35% purity is challenging. Dramatic amplifications are usually detectable in high coverage samples with around 15%. In clean data, usually 3-4 exon high level amplifications are detectable, in noisy data even 6-10 exons can be challenging. 

If the purity is very low, like below 2-3%, then the algorithm might start fitting mainly noise because there simply is not a lot of signal. The returned purity can be pretty random in those cases. These cases are usually obvious in a manual curation.

PureCN is designed for hybrid capture data of mostly exons and can therefore only use coverage (i.e in cannot use split reads etc.).  So there is not a lot we can do algorithmically - most of the recent efforts are related to cleaning up the data optimally and using all data as efficiently as possible. Mostly using pool of normal samples. 


Login before adding your answer.

Traffic: 546 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6