Question

PureCN: Copy Number of 7 issue

0

Entering edit mode

jacobgross04 • 0

@dd92f3df

Last seen 2.7 years ago

United States

Inherited what I believe to be a good implementation of PureCN for use in analyzing WXS in a cohort. We see that while there are only a few gene or regional CN/LOH datapoints, some values occur frequently (CN of 7.00 is observed in 90% of the ~150 alterations observed throughout the cohort). This occurs in multiple cohorts with the same sequencing and downstream analyses, which might lead us to believe it is something due to our data source, or perhaps a mis-step in our analysis.

Has this been observed before? We have some samples/regions which are less peculiar (CN is still above and below 7 in some cases). I am naively trying to identify if this issue could be an artifact/bias or misstep in my implementation.

Pure PureCN • 1.7k views

ADD COMMENT • link updated 2.7 years ago by markus.riester ▴ 130 • written 2.7 years ago by jacobgross04 • 0

0

Entering edit mode

Also, feel free to post an example log file and I can check if setup is fine.

ADD REPLY • link 2.7 years ago markus.riester ▴ 130

0

Entering edit mode

Log file is here: https://drive.google.com/file/d/11VCot5RzNijxzpgZ63qvkQ-AZK19GqGu/view?usp=sharing

ADD REPLY • link 2.7 years ago jacobgross04 • 0

0

Entering edit mode

Looks great. The only thing you can improve is running Mutect2 with --interval-padding 50 if you don't already. Ideally then also do that on the normal samples and recreate the mapping bias file. This typically increases the number of SNPs quite a lot, thus improving power to call LOH.

Most of the fixes that might affect artifacts should be already in, but you can try updating to 2.0.1.

Would appreciate lists of artifacts.

ADD REPLY • link 2.7 years ago markus.riester ▴ 130

0

Entering edit mode

You might also want to try PureCN.R --fun-segmentation GATK4 (just make sure gatk binary is in path). Some users have reported cleaner profiles in WES over PSCBS. I optimized the PSCBS based function for our panels with cfDNA and I think GATK is more tuned towards WES/WGS.

ADD REPLY • link 2.7 years ago markus.riester ▴ 130

score 1 · Answer 1 · 2021-11-30

1

Entering edit mode

markus.riester ▴ 130

@markusriester-9875

Last seen 2.1 years ago

United States

Hi Jacob,

I am currently working on identifying artifacts better. One thing I hope getting ready by next release is training systematic differences in tumor vs pool of normal and cleaning them up in the normalization step. Essentially adding a tumor database in addition to the current normal database.

There were also a few minor fixes in the last couple of releases that should be a bit better in avoiding those false calls. So if you are behind a few versions, maybe try upgrading, should be smooth.

If you see a pattern, feel free to post here: https://github.com/lima1/PureCN/issues/179

We are doing mostly cfDNA these days where I simply check for recurring calls in samples of 0 purity, but that’s probably not helpful for you. Most of my artifacts are short or very long genes, GC outliers, or poor mappability.

Markus

ADD COMMENT • link 2.7 years ago markus.riester ▴ 130

0

Entering edit mode

Thanks Markus,

I will check to which version we are using and try out the newer release and link the outcome here later on.

We see this across many different samples throughout the multiple cohorts. In general would you expect a mix of whole and fractional numbers to be present frequently from the regional CN evaluation?

ADD REPLY • link 2.7 years ago jacobgross04 • 0

0

Entering edit mode

Not sure I fully understand the question, but you get non-integer values for copy number only for high level amplifications (reported is simply the log2 ratio converted to purity/ploidy adjusted copy number - for lower copy numbers we try to find the correct integer value) or sub-clonal alterations. PureCN isn’t made for reliably detecting sub-clonality, so ymmv here.

But yes, artifacts are to some extend unavoidable, unfortunately. Like I said, feel free to post examples of suspected artifacts in the GitHub issue and I’ll have a look how we can avoid them in the future. Also if you are unsure that the setup is optimal, post a log file.

ADD REPLY • link 2.7 years ago markus.riester ▴ 130

0

Entering edit mode

Thanks so much Markus,

I believe this is exactly what I was asking. Most likely this looks odd because of how we selected samples on a previous sequencing run. I will post a log file above for a sanity check and we will update the GitHub issues page if we have any problems.

Jake

ADD REPLY • link 2.7 years ago jacobgross04 • 0