Question: Is mean adjusted for purity?
0
12 months ago by
twtoal0
twtoal0 wrote:

You previously indicated that the C values in the PureCN output have been adjusted for purity.  Does this also hold true for the "mean" values for mean log ratios?  It appears to me that they are adjusted for purity (but are still log mean ratios, not actual mean ratios and not twice the ratio to give a copy number value).

purecn mean ratio • 363 views
ADD COMMENTlink
modified 12 months ago by markus.riester110 • written 12 months ago by twtoal0
Answer: Is mean adjusted for purity?
0
12 months ago by
markus.riester110 wrote:

All the log-ratios are standard log2 tumor vs normal coverage (of course after normalization for total sequencing coverage). Exactly what you would get from any other copy number tool that does not do any purity/ploidy adjustment like CNVkit, GATK4 etc. So no purity adjustment.

If you need purity adjustment of log-ratios for some reason, for example when downstream tools like GISTIC expect log2 ratios, you can follow https://www.nature.com/articles/ng.2760 (section Impurity-corrected GISTIC).

Feel free to add a GitHub issue if you think that some of the output is not clearly documented in the main vignette (mainly Tables 1-5).

ADD COMMENTlink written 12 months ago by markus.riester110

Thanks, that was a helpful reference.  However, I see that it has a mistake in its equation for R'(x).  I believe the correct equation should be:

R'(x) = q(x)/T = [atR(x) + 2(1-a)R(x) - 2(1-a)] / aT

T = tau, a = alpha, q(x) = integer CN in cancer cells, R(x) = observed CN ratio, R'(x) = CN ratio in tumor cells

His derivation:

  R(x) = (aq(x)+2(1-a))/D
D = aT + 2(1-a)
q(x) = DR(x)/a - 2(1-a)/a
R'(x) = q(x)/T = R(x)/a - 2(1-a)/aT

where:

  R'(x) = adjusted coverage ratio
R(x) = raw coverage ratio
q(x) = integer copy number in cancer cells
D = average ploidy across all cells of tumor (of sample)
a = purity
T = tumor ploidy

However, in the last step where he substituted q(x) in q(x)/T, he did the algebra wrong.  The correct algebra is:

R'(x) = q(x)/T = DR(x)/aT - 2(1-a)/aT = (aT + 2(1-a))R(x)/aT - 2(1-a)/aT
= R(x) + 2(1-a)R(x)/aT - 2(1-a)/aT
= [aTR(x) + 2(1-a)R(x) - 2(1-a)]/aT

As a test, say that purity = a = 0.5, tumor ploidy = T = 2, and raw coverage ratio is 1.5.  Then we expect the adjusted coverage ratio to be 2 (tumor segment is 2X amplify (4 copies) and this becomes raw ratio of 1.5 when purity is 1/2:   [0.5*4 + 0.5*2] / 2 = 1.5).

His: R'(x) = 1.5/0.5 - 2(0.5)/(0.5 * 2) = 3 - 2(0.5) = 2 (correct)
​Mine: R'(x) = [0.5*2*1.5 + 2(0.5)1.5 - 2(0.5)] / (0.5*2) = 1.5 + 1.5 - 1 = 2 (correct)

But now suppose that tumor ploidy = T = 4, and we still have purity=a=0.5.  Say raw coverage ratio = 1.0, which means there is no tumor amplification, the number of copies at any locus is the same as the mean number of copies, in both the 2X normal and 4X tumor tissue.  Then we expect the adjusted coverage ratio to also be 1.

His: R'(x) = 1/0.5 - 2(0.5)/(0.5 * 4) = 2 - 2(0.5)/2 = 2 - 1/2 = 1.5 (wrong)
Mine: R'(x) = [0.5*4*1 + 2(0.5)1 - 2(0.5)] / (0.5 * 4) = [2 + 1 - 1] / 2 = 2 / 2 = 1 (correct)
ADD REPLYlink modified 10 months ago • written 12 months ago by twtoal0

Not sure, I looked into this more than 2 years ago. I used the following and believe it's correct:

rds <- readRDS("Sampleid.rds")

r <- rds$results[[1]]  r$seg$seg.mean.adjusted <- r$seg$seg.mean/r$purity - 2*(1-r$purity)/(r$purity*r$ploidy) I haven't used it much though because I found little benefit in GISTIC and for everything else you usually want the absolute copy numbers. ADD REPLYlink written 12 months ago by markus.riester110 Your equation above matches the one in the paper you cited, which is incorrect. Your seg.mean is his R(x), your purity is his a, your ploidy is his T. I found that PureCN:::.calcExpectedRatio() is doing it correctly (it is doing the inverse operation, computing R(x) from R'(x)). However, in runAbsoluteCN(), I find this line:  opt.C <- (2^(seg$seg.mean + log.ratio.offset) *  total.ploidy)/p - ((2 * (1 - p))/p)

and since C = ratio * ploidy, the above equation is the paper's (incorrect) R'(x) * ploidy.  It seems to be wrong.  Please check it.  Maybe I'm missing something, but to me it looks like a definite algebra mistake.

ADD REPLYlink modified 10 months ago • written 12 months ago by twtoal0

I think I'll go ahead and open an issue on the PureCN github project for this.

ADD REPLYlink written 10 months ago by twtoal0
Please log in to add an answer.

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 164 users visited in the last hour