Question: Call CNV in population with large depth variance
gravatar for Xiao Zhang
3 months ago by
Xiao Zhang0
United States
Xiao Zhang0 wrote:

Hi all,


I am now using cn.mops to call CNV in a plant population including 271 samples.Now the question is the sequencing depth variant from 0.00496X to 40.11X, the average depth is about 7X, most of them are 3X~10X. So do I need cluster the samples by depth that get several groups and then calculate by groups or punch files together for calling? Thank you.

ADD COMMENTlink modified 3 months ago by Günter Klambauer310 • written 3 months ago by Xiao Zhang0
gravatar for Günter Klambauer
3 months ago by
Günter Klambauer310 wrote:

Hello Xiao Zhang,

Yes, clustering the samples with respect to sequencing depth is certainly advisable. You can include the higher coverage samples when you analyze the low coverage ones. Let me explain this further: Let's say the low coverage samples are A, the medium coverage samples are B, and the high coverage samples C. Then you should make three cn.MOPS runs:

1.) cn.mops on A,B,C with large window length (low resolution) --> CNV calls for low coverage group A.

2.) cn.mops on B,C with a medium window length --> CNV calls for medium coverage group B.

3.) cn.mops on C with a small window length (high resolution) --> CNV calls for high coverage group C.

The reason is that adding more samples with higher coverage can improve the estimates of each DNA region. However, the CNV calls for the higher coverage samples are at a low resolution.

I hope this helps you with the analysis.




ADD COMMENTlink written 3 months ago by Günter Klambauer310

Thank you Günter. Do I need set window length by my self or set by software automatically, which is better? Do you have any refereces to set window length? 



ADD REPLYlink written 3 months ago by Xiao Zhang0

Hello Xiao,

The program determines the window length automatically based upon the sample with the lowest number of reads (lowest coverage). However, I advise to do some calculations and set this parameter by hand such that on average about 50-100 reads map to each window (segment).

The average number of reads per window/segment is: averageReadCount=coverage*windowLength/readLength. Assuming you have want to have on average 50 reads in a segment/window, you have windowLength = readLength * 50 /coverage. For your low-coverage samples with coverage of 0.005, you should use a window length of 50*100/0.005=1e6bp (assuming a read length of 100). The smallest CNVs you will be able to detect is three times (determined by cn.mops's parameter "minWidth=3") this length, meaning 3e6bp. You will be able to detect only very large CNVs.

For the a medium coverage of 5X,  this formula suggests a window length of 1000bp and the smallest detected CNVs will be 3000bp (with "minWidth=3").



ADD REPLYlink written 3 months ago by Günter Klambauer310

Thank you Günter, this helps me a lot!

The other question is the result of data frame of "segmentation" function. The data frame contained several columns named "seqname", "start", "end", "width", "strand", "sample", "median", "mean" and "CN". Are the "median" and "mean" here both refer to the I/NI calls? How to filter this data frame to get more confident CNVs? I have read other's Q&A, you said "The farer the value is away from 0, the more likely there is a CNV", do you have a standard for this?



ADD REPLYlink written 3 months ago by Xiao Zhang0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 111 users visited in the last hour