I'm concerned that there may be something wrong with my PureCN output from callAlterations(), which I ran via PureCN.R. I have 117 gastric cancer tumor samples and about 740 targeted genes. This gives about 87664 total lines of callAlterations() output from all the samples together. Of those, there are only 689 amplifications and 78 deletions. When one looks at the distribution of copy ratios in the segments, about 5% of all segments are deletions and 10% are amplifications, so I would have expected roughly the same ratio with the genes. Further, I thought I did have more gene deletions/amplifications; I'm wondering if, when I upgraded to a newer PureCN, there was a big change that drastically lowered the count? Unfortunately, I can't easily go back and rerun the old PureCN and check this, mainly due to time constraints.
At the same time, there are 24,493 genes that have been called with LOH. That's about 30% of all genes with LOH. Can that be right? And that implies that most of the LOH occurs in the absence of deletion. I would have expected most LOH to occur because one of the two alleles was deleted.
Do you agree something is wrong here? Any suggestions on where I might start looking?
My PureCN version is 1.16.0.
We only have 5 of 33 samples that are MSI. About 55% are CIN.
I see that one problem is that the "type" column of the gene CNV file is NA most of the time. Does that imply it was unable to determine whether there were amplifications/deletions? What criteria does it use to decide NA vs non-NA?
Perhaps I have a misunderstanding of the definition of amplification and deletion here. I am assuming an amplification is when the copy number is > ploidy, and deletion when CN < ploidy ? (significantly more/less)
It is using the callAlterations function in the background: https://rdrr.io/bioc/PureCN/man/callAlterations.html
By default, amplifications are called when focal and copy number >=6 or non-focal and copy number >=7. NA means no call, i.e. did not pass the homozygous deletion or amplification cutoffs.
For some analyses, yes, it might make sense to add such gain/loss categories similar to cbioportal (hom. del, loss, normal, gain, amplification). The functional impact of those is much less clear, so those won't be called by default.