Question

PureCN C (integer segment copy number) in curated samples

1

Entering edit mode

twtoal ▴ 10

@twtoal-15473

Last seen 15 months ago

United States

I am seeing a significant different between my own non-integer purity/ploidy-adjusted copy number of segments and the value that PureCN reports in the LOH segments "C" column (usually integer) and I'm trying to understand why. The absolute value of the amount of difference, averaged over all segments of all samples, is about 0.51 copies. Since C is integer and my copy-number-adjusted values are not, I would expect the mean difference to be about 0.25 (uniformly distributed between 0 and 0.5), so it is about twice what is expected.

Is there anything about the value of "C" that might cause it to be more different than expected from the value obtained by taking 2^seg.mean and adjusting it for purity/ploidy?

As an example, in one of my samples, which has a PureCN ploidy of 4.0, most of the segments are long and cover most of the chromosome arm, and my adjusted copy number is about 5 for those but PureCN has 4.

PureCN wouldn't be rounding down on copy number somewhere, would it?

I even see that if I take ploidy * 2^seg.mean I get a value of 4.68 in the case I am looking at, whereas PureCN has a C value of 4. The purity is 0.35, so after adjusting for purity/ploidy I get a copy number of 5.36. So I don't know where the C=4 is coming from.

PureC PureCN • 2.2k views

ADD COMMENT • link updated 3.9 years ago by markus.riester ▴ 130 • written 3.9 years ago by twtoal ▴ 10

0

Entering edit mode

Can you add a screenshot of the B-allele plot?

ADD REPLY • link 3.9 years ago markus.riester ▴ 130

0

Entering edit mode

enter image description here

ADD REPLY • link 3.9 years ago twtoal ▴ 10

0

Entering edit mode

Hm, can you say a bit more about the assay? 8000 SNPs is an uncommon number, too large for panels (unless it has a copy number backbone), way too low for WES. 30X is also quite low. Feel free to post the complete log file, might points to some issues. This looks very noisy.

ADD REPLY • link 3.9 years ago markus.riester ▴ 130

0

Entering edit mode

It is WES. This sample is one of our poorer ones, probably had poor DNA quality and/or very small amount of DNA. We were shooting for a coverage of at least 100X, but on this sample the mean coverage was only about 20X. There were 50K variants called by Mutect2, but only about 16K were given to PureCN in the VCF file; those with depth < 5 or alt allele depth < 2 or that were estimated by Mutect2 as artifacts or were in repeat regions were removed, which probably removed a LOT of them and resulted in an overall higher coverage of 32X seen by PureCN. We were hoping that would still be sufficient for PureCN to estimate purity/ploidy well. It DOES look pretty noisy.

Still, I'm still wondering about the answer to the question, because I would have thought that the C value would just be the rounded version of the adjusted copy ratio. But maybe that is not true, maybe a lot of other factors are influencing it, such as GC content, mapping bias, etc.???

We haven't yet taken a deep look at various results to decide whether some samples are poor enough to be excluded from analysis, but it looks like this might be one of them.

ADD REPLY • link 3.9 years ago twtoal ▴ 10

0

Entering edit mode

Looks like a failed sample to me. Purity you might get still a good idea, but as you can see, not a lot of information in the SNPs. Look at the allelic fraction of somatic variants and see if there is a purity and ploidy solution that makes most sense. If there are some clear amplifications, they might be real.

It's rounded in terms PureCN tries to assign integer values to most (1-max.non.clonal) of the genome. It adjust purity/ploidy as well as does some slight up and down shifting of log ratios to find a solution that fits well. It's very likely in your case overfitting noise.

ADD REPLY • link 3.9 years ago markus.riester ▴ 130