Question

Understanding PureCN and its germline SNPs

0

Entering edit mode

twtoal ▴ 10

@twtoal-15473

Last seen 17 months ago

United States

Here are several questions about PureCN germline SNPs, all interrelated.

Should a germline caller be used to obtain germline SNP information for use with PureCN? If so, wouldn't it make sense to provide this VCF file as one of the inputs to PureCN?

How is a germline SNP defined for PureCN purposes? It is unclear if all germline SNVs are considered germline SNPs, and if they should all be marked with the INFO/DB flag. If not, which ones should be?

Can gnomAD be used instead of DBsnp, and all gnomAD variants found in the normal be flagged with DB? Or alternatively, can ALL germline variants be flagged with DB?

When Mutect is run in matched normal mode, where does PureCN get the germline SNPs, since they have been filtered out? When not run in matched normal mode, where does it get them, since no normal was used to identify SNPs?

It is unclear if the SNPs in the VCF file need to have tumor depth info with them, or only normal depth info. Or, if only a tumor with no normal is used, do SNPs still appear in the VCF file, and how do they get there, and do they have tumor depth info?

PureCN germline SNP • 2.4k views

ADD COMMENT • link updated 7.5 years ago by kakbrus • 0 • written 7.7 years ago by twtoal ▴ 10

score 3 · Accepted Answer · 2018-04-07

Somatic copy number events most often result in an unbalanced number of maternal and paternal chromosomes. That in turn results in allelic imbalance of germline SNPs we can easily detect in tumor. SNPs thus help us to get the ploidy right.

PureCN was mainly written for tumor-only VCFs that include all variant sites, germline and somatic.

Support for matched normals is available, but all it does is, instead of using the DB info flag, using the SOMATIC flag (so don't worry about germline databases when you have matched normals). It will also do some additional artifact filtering, for example removing SNPs of low quality in the normal. Matched normals are also helpful in high purity samples to distinguish heterozygous from homozygous SNPs. But they are not crucial.

Mutect 1 keeps germline SNPs by default, Mutect 2 in GATK4 can emit germline sites when you provide the --genotype-germline-sites flag (requires the very latest version 4.0.3.0). PureCN will do basic filtering by Mutect failure flags and will keep high quality germline calls. We haven't switched to Mutect 2, so M2 isn't as well tested there yet.

For tumor-only, you can now (developer version) specify the name of the germline flag in the VCF. You can for example create a flag that contains what you consider likely germline (for example population allele frequency > 0.1%). You can also provide a POP_AF info field with population allele frequencies, for example from gnomAD.

Yes, the VCF needs to have an AD FORMAT field with both reference and alt reads counts. If matched, then for both tumor and normal.

Hope that helps.