Hi, I had the test run with WGS daat of four individuals (from one family) and got pretty interesting result. But also reckoned that some cnv calls looked bit biased due to the samples that are from the same family. I am tempted to run multiple samples - increasing the number of families. Do you have any idea or minimum number required for multiple sample CNV calling using cn.mops?
Thanks for your interest in cn.mops. I would recommend at least 6 samples for cn.mops. The more the better, of course. Note that the samples that you are adding should have exactly the same processing (sequencing technique, QC, readmapper, etc) as the ones you already have!
Let me know if you need more help with your analysis!
1. How can I know the WL after I ran getReadCountsFromBAM without specifying WL? I found that the value after this function has an interval and this interval might be a WL, in the case below was 1000. Please correct me if I am wrong, or is there any function to retrieve the WL value?
2. Do you have any suggestion for optimal WL for 30-50X WGS human data?
3. As suggest previously, I'd like to run cn.mops with WL from the lowest coverage sample(s) and wondering cn.mops package provides a function for get coverage.
4. Is there any function for parallel processing in cn.mops? I'd like to speed up the process and if no parallel processing provided, I am thinking running cn.mops by chromosomes.
2. There is a simple relation between coverage (C), window length (WL), read length (RL) and average number of reads per bin (m): C=m*RL/WL. cn.MOPS has a good performance if the average number of reads in a window is 50-100. Assuming your reads have read length of 100 and a coverage of 50X, this suggests a WL of 100bp-200bp.
3. cn.mops does not provide a function for calculating the coverage but it is easy to determine if you have already calculated the read counts:
You could - for example - do an initial run with a large WL, then calculate the read counts and coverage. Based on these coverage values you can determine a more suitable WL and recalculate the read counts. Probably there is a function to calculate the coverage in Rsamtools.
4. Yes, of course. Almost all functions have an argument "parallel" that you can set to the desired number of processing cores: cn.mops(..., parallel=24) would run cn.mops on 24 cores.