Dear administrator;
I found this comment that the software currently does not officially support VCF files containing indels. Support for VCFs generated by MuTect 2 that include both single nucleotide variants (SNVs) and indels is planned for Bioconductor 3.5.
Now, the version of bioconductor is 3.5.
So can I classify indel variants as germline vs. somatic with PureCN?
Thanks for your help.
Dear Markus;
Thanks for your kind reply.
I also found this comment that samples with tumor purities below 20% usually cannot be analyzed with this algorithm.
It can be improved in a new PureCN version?
JOHEON KIM
Good question. Germline vs. somatic classification below 30-35% purity is easy and you can expect a 99+% accuracy below 20% - you wouldn't even need PureCN for that.
The 20% number refers to purity/ploidy inference. The actual lower limit depends on coverage and quality of the data. In high coverage, high quality data from highly optimized assays with a sufficiently large pool of normal samples (for which the tool was designed for, see the vignette for details), this can be as low as 15%. Poor quality FFPE data can be so noisy that even 35% purity is challenging. Dramatic amplifications are usually detectable in high coverage samples with around 15%. In clean data, usually 3-4 exon high level amplifications are detectable, in noisy data even 6-10 exons can be challenging.
If the purity is very low, like below 2-3%, then the algorithm might start fitting mainly noise because there simply is not a lot of signal. The returned purity can be pretty random in those cases. These cases are usually obvious in a manual curation.
PureCN is designed for hybrid capture data of mostly exons and can therefore only use coverage (i.e in cannot use split reads etc.). So there is not a lot we can do algorithmically - most of the recent efforts are related to cleaning up the data optimally and using all data as efficiently as possible. Mostly using pool of normal samples.