Here are several questions about PureCN germline SNPs, all interrelated.
Should a germline caller be used to obtain germline SNP information for use with PureCN? If so, wouldn't it make sense to provide this VCF file as one of the inputs to PureCN?
How is a germline SNP defined for PureCN purposes? It is unclear if all germline SNVs are considered germline SNPs, and if they should all be marked with the INFO/DB flag. If not, which ones should be?
Can gnomAD be used instead of DBsnp, and all gnomAD variants found in the normal be flagged with DB? Or alternatively, can ALL germline variants be flagged with DB?
When Mutect is run in matched normal mode, where does PureCN get the germline SNPs, since they have been filtered out? When not run in matched normal mode, where does it get them, since no normal was used to identify SNPs?
It is unclear if the SNPs in the VCF file need to have tumor depth info with them, or only normal depth info. Or, if only a tumor with no normal is used, do SNPs still appear in the VCF file, and how do they get there, and do they have tumor depth info?
When using both a normal and tumor matched pair, why isn't there an option to PureCN to tell it the ID of the normal sample in the VCF, as there is the --sampleid option to give it the tumor ID?
It is currently only supporting either tumor-only or tumor/normal VCFs. Feel free to add a feature request on GitHub. PureCN is currently not supporting multiple time points (or regions) per sample, so a multi-sample VCF doesn't provide anything that would be used. But this might change in the future.