Entering edit mode
My samples are diffuse gastric cancer and consequently have quite low purity (about 0.2). PureCN has flagged most samples as having very low purity and noisy log ratio. I presume that the low purity causes the noisy log ratio? We are planning to sequence additional DNA from these same samples to improve coverage. Will additional sequencing depth improve the noise ratio for PureCN?
If you send an example, preferable one with lots of CNAs, I'll have a look. I would need the log file and the main PDF (Sampleid.pdf). If the PDF is too large, a screenshot of the first 3 pages would work too.
Noisy log-ratio should only happen for poor quality samples, like old FFPE with high duplication rates and poor coverage (<70X). If that happens for most, there is likely a problem with the coverage normalization.
Here's a dropbox link to a zip with those two files for one of my samples.
https://www.dropbox.com/s/suqkfrcgfhuwwch/VO-56T3.zip?dl=0
Our samples are all FFPE, not particularly old. Duplication rate seems to be around 25%.
You provide the normal coverage in PureCN.R with --normal, right? Try again without. If you specify the matched normal coverage, it will use that one for normalization. If you don't, it will use the pool of normals, which is usually much better. You should see a significant improvement (i.e. decrease in log-ratio standard deviation).
If it is still high, the issue is likely the low-ish coverage. Have a look how many reads the average off-target bin has. If it is much lower than on-target, you could increase the off-target bin width. You can do that in IntervalFile.R with ---offtargetwidth. This requires starting from scratch though.
Ok, I'll try removing --normal and see how that goes. Meanwhile...
I made bar plots of many PureCN output values for each of my samples (one bar per sample). That pdf file of plots is on dropbox here: https://www.dropbox.com/s/9c1p8octwaul2py/PureCN_plots.pdf?dl=0
Also, I zipped all the log files and put them on dropbox here:
https://www.dropbox.com/s/ucxywacd4cbjn1g/PURECN_LOG.zip?dl=0
The plots include several where I pair PureCN output with similar output produced by my own CNV estimator program. You can ignore the plots for my data, they are not necessarily accurate at all.