Question

Manual import of CNV data (created with IchorCNA) into QSEAsest wit addCNV()

1

Entering edit mode

a.riediger ▴ 20

@f3a43e1e

Last seen 14 months ago

Germany

Hello, I use the QSEA package to analyse my MeDIP sequencing data (enriched samples + input samples available). Since I anyway run ichorCNA in advance for CNA analysis, I have a text file available with the following data [with example] for each 1MB- segment/bin:

(chr [chr1], start [4000001], end [5000000], copy.number [4],event [AMP], logR [0.236], subclone.status [0], Corrected_Copy_Number [4],Corrected_Call [AMP],logR_Copy_Number [4.229].

In my understanding, the strategy would be to extract one copy-number value for each sample for each bin and create a common GRange object out of this data for all samples. This can be added to addCNV(cnv = <GRANGE OBJECT>) .

However, by comparing my CNV data from ichorCNA with the CNV data from my QSEAset, calculated with QSEA, I was not sure which value (copy.number, corrected_copy_number, logR_copy_Number) represents the reported copy-number value in the QSEAset. Actually, it should be corrected_copy_number or logR_copy_number, but these values are in a completely different scale/magnitude as the ones in the QSEAset (values < 1).

Which value is used/reported for copy number alterations, calculated with addCNV() in QSEA? (log2 of gc- and mappability corrected copy numbers?!)

Additionally, setting the same bin size (1Mb), ichorCNA starts with chr1, 1000001-2000000, whereas the bins in the QSEAset start with chr1, 1-10000000.

Will this cause problems or does the algorithm expect any kind of conformation, as long as it is imported as a Grange object?

I would appreciate your help :) Thanks a lot!

ichorCNA qsea • 611 views

ADD COMMENT • link updated 14 months ago by Matthias Lienhard ▴ 240 • written 14 months ago by a.riediger ▴ 20

score 0 · Answer 1 · 2023-02-21

Hi Anja,

QSEA is employing the HMMcopy library for CNV analyis, and reports the value as log2 ratio compared to the "expected normal level", e.g. the median of all non cancer samples or the median of all samples if no normal samples are available. This means a value of 1 should correspond to a duplication. The analysis is quite basic, and does not consider GC content nor mapablility, however, by using relative values, the hope is that these biases cancle out. I am not familiar with the output of ichorCNA, and how this corresponds. But there is a nice visualization function plotCNV, which can also be hacked to compare different computational methods, such as in Supplementary Figure 8 of the paper (https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkw1193#supplementary-data), where we compared CNVs computed from input seq with CNVs computed from MeDIP seq.

The missing first window is no problem, just insert a window with normal levels for all samples (or NA).

Best, Matthias