Question

Assigning DUP/DEL p-value to CNV segment

0

Entering edit mode

twtoal ▴ 10

@twtoal-15473

Last seen 22 days ago

United States

Does the following seem like a good method to assign a p-value for segment DUPLICATION or DELETION, to each segment in the dnacopy.seg file?

1. Use PureCN's readCurationFile to read the .rds file into object RDS, then retrieve the log ratios of the marks in RDS$input$log.ratio

2. Compute the adjusted copy ratio of each mark by taking 2^log.ratio, then applying the purity/ploidy adjustment equation to the result, using purity and ploidy from the curation file.

3. Use a Wilcoxon 1-sample test to test whether the resulting copy ratios are significantly greater, or less, than 1.

PureCN • 1.2k views

ADD COMMENT • link 5.8 years ago twtoal ▴ 10

score 0 · Answer 1 · 2018-10-10

0

Entering edit mode

markus.riester ▴ 130

@markusriester-9875

Last seen 2.1 years ago

United States

Yes, you want the coverage, not the variant log-ratios. Usually less than 15% of exons (num.marks in DNAcopy output) have variants, so you would ignore most information.

You can probably use something like voom to compare the tumor coverage against all normals in the database. This will incorporate the variance in pool of normals when calculating p-values. But I don't think this will be that useful - it is probably not sensitive enough for very low purities and for higher purities, pretty much everything PureCN calls should be significant. But worth a try if you really need p-values. Let me know if you are happy with the results.

You can use the readCoverageFile function to load the coverage files of tumor and normal, build a matrix of counts and use something like voom.

ADD COMMENT • link 5.8 years ago markus.riester ▴ 130

0

Entering edit mode

Sorry, my edit to my question crossed with your response, can you re-read my question and adjust your response?

ADD REPLY • link 5.8 years ago twtoal ▴ 10

0

Entering edit mode

This works probably well for larger segments, but you assume a constant variance and ignore the fold-change. Using something like voom will incorporate the variance observed in the pool of normals and the variance due to coverage. It might be ok, the log-ratios are cleaned of most noise we can get rid off, but I'm not sure this rank based approach will work in short segments where the p-value would be most useful. A pure tumor vs normal coverage p-value also does not account for purity and ploidy.

Since this is not really a PureCN question, look around, you might also find something in the germline literature or ask in a broader forum like Biostars.

ADD REPLY • link 5.8 years ago markus.riester ▴ 130