Question

Call CNV in population with large depth variance

0

Entering edit mode

Xiao Zhang • 0

@xiao-zhang-9091

Last seen 7.0 years ago

United States

Hi all,

I am now using cn.mops to call CNV in a plant population including 271 samples.Now the question is the sequencing depth variant from 0.00496X to 40.11X, the average depth is about 7X, most of them are 3X~10X. So do I need cluster the samples by depth that get several groups and then calculate by groups or punch files together for calling? Thank you.

cn.mops • 1.3k views

ADD COMMENT • link updated 7.3 years ago by Günter Klambauer ▴ 540 • written 7.3 years ago by Xiao Zhang • 0

score 0 · Answer 1 · 2017-01-12

0

Entering edit mode

Günter Klambauer ▴ 540

@gunter-klambauer-5426

Last seen 3.2 years ago

Austria

Hello Xiao Zhang,

Yes, clustering the samples with respect to sequencing depth is certainly advisable. You can include the higher coverage samples when you analyze the low coverage ones. Let me explain this further: Let's say the low coverage samples are A, the medium coverage samples are B, and the high coverage samples C. Then you should make three cn.MOPS runs:

1.) cn.mops on A,B,C with large window length (low resolution) --> CNV calls for low coverage group A.

2.) cn.mops on B,C with a medium window length --> CNV calls for medium coverage group B.

3.) cn.mops on C with a small window length (high resolution) --> CNV calls for high coverage group C.

The reason is that adding more samples with higher coverage can improve the estimates of each DNA region. However, the CNV calls for the higher coverage samples are at a low resolution.

I hope this helps you with the analysis.

Regards,

Günter

ADD COMMENT • link 7.3 years ago Günter Klambauer ▴ 540

0

Entering edit mode

Thank you Günter. Do I need set window length by my self or set by software automatically, which is better? Do you have any refereces to set window length?

Regards,

Xiao

ADD REPLY • link 7.3 years ago Xiao Zhang • 0

0

Entering edit mode

Hello Xiao,

The program determines the window length automatically based upon the sample with the lowest number of reads (lowest coverage). However, I advise to do some calculations and set this parameter by hand such that on average about 50-100 reads map to each window (segment).

The average number of reads per window/segment is: averageReadCount=coverage*windowLength/readLength. Assuming you have want to have on average 50 reads in a segment/window, you have windowLength = readLength * 50 /coverage. For your low-coverage samples with coverage of 0.005, you should use a window length of 50*100/0.005=1e6bp (assuming a read length of 100). The smallest CNVs you will be able to detect is three times (determined by cn.mops's parameter "minWidth=3") this length, meaning 3e6bp. You will be able to detect only very large CNVs.

For the a medium coverage of 5X, this formula suggests a window length of 1000bp and the smallest detected CNVs will be 3000bp (with "minWidth=3").

Regards,

Günter

ADD REPLY • link 7.3 years ago Günter Klambauer ▴ 540

0

Entering edit mode

Thank you Günter, this helps me a lot!

The other question is the result of data frame of "segmentation" function. The data frame contained several columns named "seqname", "start", "end", "width", "strand", "sample", "median", "mean" and "CN". Are the "median" and "mean" here both refer to the I/NI calls? How to filter this data frame to get more confident CNVs? I have read other's Q&A, you said "The farer the value is away from 0, the more likely there is a CNV", do you have a standard for this?

Regards,

Xiao

ADD REPLY • link 7.2 years ago Xiao Zhang • 0