Question

Is mean adjusted for purity?

0

Entering edit mode

twtoal ▴ 10

@twtoal-15473

Last seen 22 days ago

United States

You previously indicated that the C values in the PureCN output have been adjusted for purity. Does this also hold true for the "mean" values for mean log ratios? It appears to me that they are adjusted for purity (but are still log mean ratios, not actual mean ratios and not twice the ratio to give a copy number value).

PureCN mean ratio • 1.9k views

ADD COMMENT • link updated 6.2 years ago by markus.riester ▴ 130 • written 6.2 years ago by twtoal ▴ 10

score 0 · Answer 1 · 2018-05-29

0

Entering edit mode

markus.riester ▴ 130

@markusriester-9875

Last seen 2.1 years ago

United States

All the log-ratios are standard log2 tumor vs normal coverage (of course after normalization for total sequencing coverage). Exactly what you would get from any other copy number tool that does not do any purity/ploidy adjustment like CNVkit, GATK4 etc. So no purity adjustment.

If you need purity adjustment of log-ratios for some reason, for example when downstream tools like GISTIC expect log2 ratios, you can follow https://www.nature.com/articles/ng.2760 (section Impurity-corrected GISTIC).

Feel free to add a GitHub issue if you think that some of the output is not clearly documented in the main vignette (mainly Tables 1-5).

ADD COMMENT • link 6.2 years ago markus.riester ▴ 130

0

Entering edit mode

Thanks, that was a helpful reference. However, I see that it has a mistake in its equation for R'(x). I believe the correct equation should be:

R'(x) = q(x)/T = [atR(x) + 2(1-a)R(x) - 2(1-a)] / aT

T = tau, a = alpha, q(x) = integer CN in cancer cells, R(x) = observed CN ratio, R'(x) = CN ratio in tumor cells

His derivation:

  R(x) = (aq(x)+2(1-a))/D
  D = aT + 2(1-a)
  q(x) = DR(x)/a - 2(1-a)/a
  R'(x) = q(x)/T = R(x)/a - 2(1-a)/aT

where:

  R'(x) = adjusted coverage ratio
  R(x) = raw coverage ratio
  q(x) = integer copy number in cancer cells
  D = average ploidy across all cells of tumor (of sample)
  a = purity
  T = tumor ploidy

However, in the last step where he substituted q(x) in q(x)/T, he did the algebra wrong. The correct algebra is:

R'(x) = q(x)/T = DR(x)/aT - 2(1-a)/aT = (aT + 2(1-a))R(x)/aT - 2(1-a)/aT
      = R(x) + 2(1-a)R(x)/aT - 2(1-a)/aT
      = [aTR(x) + 2(1-a)R(x) - 2(1-a)]/aT

As a test, say that purity = a = 0.5, tumor ploidy = T = 2, and raw coverage ratio is 1.5. Then we expect the adjusted coverage ratio to be 2 (tumor segment is 2X amplify (4 copies) and this becomes raw ratio of 1.5 when purity is 1/2: [0.5*4 + 0.5*2] / 2 = 1.5).

His: R'(x) = 1.5/0.5 - 2(0.5)/(0.5 * 2) = 3 - 2(0.5) = 2 (correct)
Mine: R'(x) = [0.5*2*1.5 + 2(0.5)1.5 - 2(0.5)] / (0.5*2) = 1.5 + 1.5 - 1 = 2 (correct)

But now suppose that tumor ploidy = T = 4, and we still have purity=a=0.5. Say raw coverage ratio = 1.0, which means there is no tumor amplification, the number of copies at any locus is the same as the mean number of copies, in both the 2X normal and 4X tumor tissue. Then we expect the adjusted coverage ratio to also be 1.

His: R'(x) = 1/0.5 - 2(0.5)/(0.5 * 4) = 2 - 2(0.5)/2 = 2 - 1/2 = 1.5 (wrong)
Mine: R'(x) = [0.5*4*1 + 2(0.5)1 - 2(0.5)] / (0.5 * 4) = [2 + 1 - 1] / 2 = 2 / 2 = 1 (correct)

ADD REPLY • link 6.2 years ago • updated 6.0 years ago twtoal ▴ 10

0

Entering edit mode

Not sure, I looked into this more than 2 years ago. I used the following and believe it's correct:

rds <- readRDS("Sampleid.rds")

r <- rds$results[[1]]

r$seg$seg.mean.adjusted <- r$seg$seg.mean/r$purity - 2*(1-r$purity)/(r$purity*r$ploidy)

I haven't used it much though because I found little benefit in GISTIC and for everything else you usually want the absolute copy numbers.

ADD REPLY • link 6.2 years ago markus.riester ▴ 130

0

Entering edit mode

Your equation above matches the one in the paper you cited, which is incorrect. Your seg.mean is his R(x), your purity is his a, your ploidy is his T.

I found that PureCN:::.calcExpectedRatio() is doing it correctly (it is doing the inverse operation, computing R(x) from R'(x)).

However, in runAbsoluteCN(), I find this line:

opt.C <- (2^(seg$seg.mean + log.ratio.offset) * total.ploidy)/p - ((2 * (1 - p))/p)

and since C = ratio * ploidy, the above equation is the paper's (incorrect) R'(x) * ploidy. It seems to be wrong. Please check it. Maybe I'm missing something, but to me it looks like a definite algebra mistake.