Understanding PureCN and its germline SNPs
Entering edit mode
twtoal • 0
Last seen 6 months ago

Here are several questions about PureCN germline SNPs, all interrelated.

Should a germline caller be used to obtain germline SNP information for use with PureCN?  If so, wouldn't it make sense to provide this VCF file as one of the inputs to PureCN?

How is a germline SNP defined for PureCN purposes?  It is unclear if all germline SNVs are considered germline SNPs, and if they should all be marked with the INFO/DB flag.  If not, which ones should be?

Can gnomAD be used instead of DBsnp, and all gnomAD variants found in the normal be flagged with DB?  Or alternatively, can ALL germline variants be flagged with DB?

When Mutect is run in matched normal mode, where does PureCN get the germline SNPs, since they have been filtered out?  When not run in matched normal mode, where does it get them, since no normal was used to identify SNPs?

It is unclear if the SNPs in the VCF file need to have tumor depth info with them, or only normal depth info.  Or, if only a tumor with no normal is used, do SNPs still appear in the VCF file, and how do they get there, and do they have tumor depth info?


PureCN germline SNP • 684 views
Entering edit mode
Last seen 28 days ago

Somatic copy number events most often result in an unbalanced number of maternal and paternal chromosomes. That in turn results in allelic imbalance of germline SNPs we can easily detect in tumor. SNPs thus help us to get the ploidy right.

PureCN was mainly written for tumor-only VCFs that include all variant sites, germline and somatic.

Support for matched normals is available, but all it does is, instead of using the DB info flag, using the SOMATIC flag (so don't worry about germline databases when you have matched normals). It will also do some additional artifact filtering, for example removing SNPs of low quality in the normal. Matched normals are also helpful in high purity samples to distinguish heterozygous from homozygous SNPs. But they are not crucial.

Mutect 1 keeps germline SNPs by default, Mutect 2 in GATK4 can emit germline sites when you provide the --genotype-germline-sites flag (requires the very latest version PureCN will do basic filtering by Mutect failure flags and will keep high quality germline calls. We haven't switched to Mutect 2, so M2 isn't as well tested there yet.

For tumor-only, you can now (developer version) specify the name of the germline flag in the VCF. You can for example create a flag that contains what you consider likely germline (for example population allele frequency > 0.1%). You can also provide a POP_AF info field with population allele frequencies, for example from gnomAD.

Yes, the VCF needs to have an AD FORMAT field with both reference and alt reads counts. If matched, then for both tumor and normal.

Hope that helps. 



Entering edit mode

When using both a normal and tumor matched pair, why isn't there an option to PureCN to tell it the ID of the normal sample in the VCF, as there is the --sampleid option to give it the tumor ID?


Entering edit mode

It is currently only supporting either tumor-only or tumor/normal VCFs. Feel free to add a feature request on GitHub. PureCN is currently not supporting multiple time points (or regions) per sample, so a multi-sample VCF doesn't provide anything that would be used. But this might change in the future. 


Login before adding your answer.

Traffic: 547 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6