Mutect version, matched normal and manual curation in PureCN
0
0
Entering edit mode
Ken C • 0
@ba661187
Last seen 11 days ago
Singapore

I have a few questions on the use of PureCN and I reckon it is OK to include them in a single post. Thanks in advance for your help and insights.

  1. Mutect 1 is officially recommended by PureCN; it seems that the support for Mutect 2 is still in beta in the latest release of PureCN to date (2.2.0). Is a comparison between Mutect 1 and 2 with PureCN available somewhere? If using Mutect 2, should I provide PureCN with the Mutect 2 output before or after FilterMutectCalls? (Or does it matter? Not sure if I missed this but I could not find the related instructions.)

  2. If I understand correctly, it is recommended to use a process-matched pool of normal for copy number (coverage) normalization even if (sample-specific) matched normals are available. However, including matched normals (when available) in Mutect calls is recommended and this will help with the purity-ploidy fitting to SNV. (Of course, when matched normals are available one will possibly care less about the somatic-vs-germline classification.) I would like to confirm that my understandings are correct.

  3. Relatively new to this field, I am at a loss in terms of how I should go about doing the manual curation. I understand that certain prior biological knowledge of the samples will help to decide whether the PureCN-picked solution is "real", but such prior knowledge is not always available or we are not confident. So generally where shall I start during manual curation? What should one be looking at? If one decide to reject the default solution, what should be based on when picking from the alternatives? I understand that this is a very general question and there is no fixed algorithm, but any empirical tip is much appreciated.

PureCN • 140 views
ADD COMMENT
0
Entering edit mode

Hi Ken,

if you follow the GATK best practices (https://gatk.broadinstitute.org/hc/en-us/articles/360035531132) closely, you should be all good. We haven't switched to GATK4 internally yet, but I occasionally test PureCN with the latest GATK4 and it works well. So yes, apply all standard commands including filtering. Use matched normals when available, but provide the --genotype-germline-sites flag to get the SNP allelic fractions we need.

If you have matched normals, simply take all of them and build a pool of normals with NormalDB.R. PureCN is pretty good at extracting all kinds of information to reduce biases. If you use our Docker image, you can conveniently import the GenomicsDB from Mutect2 for the mapping bias part (check if a SNP has a bias to the reference or alt allele). Don't provide the matched normal with --normal when you have a NormalDB.

That's a tricky FAQ (see for example https://github.com/lima1/PureCN/issues/238). Feel free to post examples you are unsure, it's usually pretty obvious if something went wrong, but you need some experience.

Feel free to post the log file of an example run and I can check if everything looks good.

Markus

ADD REPLY
0
Entering edit mode

Thanks a lot Markus, this is very helpful. For #3 let me see if I can pick a "typical" run that may require manual curation among my samples. For #2, perhaps independent from what's been said about NormalDB, may I further check my understanding -- using the paired-normal mode for Mutect when possible should lead to better purity-ploidy estimation (compared to tumor-only Mutect call), since the germline SNP priors are better assigned, right?

ADD REPLY
0
Entering edit mode

PureCN is made for tumor-only, so it's pretty accurate even without using the normal in Mutect2. Usually you get identical purity and ploidy except for a small number of difficult samples where PureCN is unsure. The benefit is more in the variant classification step (germline vs somatic vs sub-clonal). It might also remove a few more artifacts that are missed by the pool of normals.

ADD REPLY
0
Entering edit mode

OK that makes sense. Thanks again!

ADD REPLY

Login before adding your answer.

Traffic: 220 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6