Question: Understanding PureCN and its germline SNPs
gravatar for twtoal
17 months ago by
twtoal0 wrote:

Here are several questions about PureCN germline SNPs, all interrelated.

Should a germline caller be used to obtain germline SNP information for use with PureCN?  If so, wouldn't it make sense to provide this VCF file as one of the inputs to PureCN?

How is a germline SNP defined for PureCN purposes?  It is unclear if all germline SNVs are considered germline SNPs, and if they should all be marked with the INFO/DB flag.  If not, which ones should be?

Can gnomAD be used instead of DBsnp, and all gnomAD variants found in the normal be flagged with DB?  Or alternatively, can ALL germline variants be flagged with DB?

When Mutect is run in matched normal mode, where does PureCN get the germline SNPs, since they have been filtered out?  When not run in matched normal mode, where does it get them, since no normal was used to identify SNPs?

It is unclear if the SNPs in the VCF file need to have tumor depth info with them, or only normal depth info.  Or, if only a tumor with no normal is used, do SNPs still appear in the VCF file, and how do they get there, and do they have tumor depth info?


snp purecn germline • 479 views
ADD COMMENTlink modified 16 months ago by kakbrus0 • written 17 months ago by twtoal0
Answer: Understanding PureCN and its germline SNPs
gravatar for markus.riester
17 months ago by
markus.riester110 wrote:

Somatic copy number events most often result in an unbalanced number of maternal and paternal chromosomes. That in turn results in allelic imbalance of germline SNPs we can easily detect in tumor. SNPs thus help us to get the ploidy right.

PureCN was mainly written for tumor-only VCFs that include all variant sites, germline and somatic.

Support for matched normals is available, but all it does is, instead of using the DB info flag, using the SOMATIC flag (so don't worry about germline databases when you have matched normals). It will also do some additional artifact filtering, for example removing SNPs of low quality in the normal. Matched normals are also helpful in high purity samples to distinguish heterozygous from homozygous SNPs. But they are not crucial.

Mutect 1 keeps germline SNPs by default, Mutect 2 in GATK4 can emit germline sites when you provide the --genotype-germline-sites flag (requires the very latest version PureCN will do basic filtering by Mutect failure flags and will keep high quality germline calls. We haven't switched to Mutect 2, so M2 isn't as well tested there yet.

For tumor-only, you can now (developer version) specify the name of the germline flag in the VCF. You can for example create a flag that contains what you consider likely germline (for example population allele frequency > 0.1%). You can also provide a POP_AF info field with population allele frequencies, for example from gnomAD.

Yes, the VCF needs to have an AD FORMAT field with both reference and alt reads counts. If matched, then for both tumor and normal.

Hope that helps. 



ADD COMMENTlink written 17 months ago by markus.riester110

When using both a normal and tumor matched pair, why isn't there an option to PureCN to tell it the ID of the normal sample in the VCF, as there is the --sampleid option to give it the tumor ID?


ADD REPLYlink written 17 months ago by twtoal0

It is currently only supporting either tumor-only or tumor/normal VCFs. Feel free to add a feature request on GitHub. PureCN is currently not supporting multiple time points (or regions) per sample, so a multi-sample VCF doesn't provide anything that would be used. But this might change in the future. 

ADD REPLYlink written 17 months ago by markus.riester110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 108 users visited in the last hour