Question: PureCN can classify indel variants as germline vs. somatic ?
0
gravatar for pathkim
2.4 years ago by
pathkim0
pathkim0 wrote:

Dear administrator;

I found this comment that the software currently does not officially support VCF files containing indels. Support for VCFs generated by MuTect 2 that include both single nucleotide variants (SNVs) and indels is planned for Bioconductor 3.5. 

Now, the version of bioconductor is 3.5.

So can I classify indel variants as germline vs. somatic with PureCN?

Thanks for your help.

indel purecn somatic germline • 421 views
ADD COMMENTlink modified 2.4 years ago by markus.riester110 • written 2.4 years ago by pathkim0
Answer: PureCN can classify indel variants as germline vs. somatic ?
1
gravatar for markus.riester
2.4 years ago by
markus.riester110 wrote:

Hi,

thanks for your interest in PureCN. GATK4 beta will be released in the next weeks and you can expect a fairly well tested PureCN version soon after. GATK4 alpha was only available under an academic license and I couldn't add this frequently requested feature in time for 3.5. Shoot me an email if you want to get notified when the new PureCN version is available.

Markus

 

ADD COMMENTlink written 2.4 years ago by markus.riester110

Dear Markus;

Thanks for your kind reply.

I also found this comment that samples with tumor purities below 20% usually cannot be analyzed with this algorithm.

It can be improved in a new PureCN version?

JOHEON KIM

ADD REPLYlink written 2.4 years ago by pathkim0
1

Good question. Germline vs. somatic classification below 30-35% purity is easy and you can expect a 99+% accuracy below 20% - you wouldn't even need PureCN for that. 

The 20% number refers to purity/ploidy inference. The actual lower limit depends on coverage and quality of the data. In high coverage, high quality data from highly optimized assays with a sufficiently large pool of normal samples (for which the tool was designed for, see the vignette for details), this can be as low as 15%. Poor quality FFPE data can be so noisy that even 35% purity is challenging. Dramatic amplifications are usually detectable in high coverage samples with around 15%. In clean data, usually 3-4 exon high level amplifications are detectable, in noisy data even 6-10 exons can be challenging. 

If the purity is very low, like below 2-3%, then the algorithm might start fitting mainly noise because there simply is not a lot of signal. The returned purity can be pretty random in those cases. These cases are usually obvious in a manual curation.

PureCN is designed for hybrid capture data of mostly exons and can therefore only use coverage (i.e in cannot use split reads etc.).  So there is not a lot we can do algorithmically - most of the recent efforts are related to cleaning up the data optimally and using all data as efficiently as possible. Mostly using pool of normal samples. 

ADD REPLYlink written 2.4 years ago by markus.riester110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 188 users visited in the last hour