I am using your no.mops for a cancer project and I annotated the cnv regions to get gene symbol info.
What the results showed is that one gene in one of the samples that is supposed to have ~2.5 CNV did not show any cnv which means no call at all. In other samples the gene got called so this region is fine in exomeseq procedure.
I wonder if the fdr is too stringent however I can’t see any p value or fdr values in the output.
Can you kindly point out where I can find these relevant info? Or is there any other reasons that this gene is not called?
Thanks for using cn.MOPS! The parameters that influence the FDR of cn.MOPS the most are the window length ("WL" of getReadCountsFromBAM) and "minWidth" of the segmentation algorithm. Since you have exome sequencing data, you probably counted the reads per exons -- so no need to set "WL". However, "minWidth" is set to 4 by default, which means that the CNV has to span at least 4 exons in order to be called. I suppose the CNV you cannot detect now is shorter than 4 exons, so I suggest to lower this parameter "minWith" to 2 or 3. Setting this value to 1 is equivalent to using no segmentation algorithm. These raw copy number calls without segmentation algorithm are always present in the "CNVDetectionResult" objects - check
One more thought: You have a cancer project and probably you have matched normals/controls. Do you use "referencecn.mops" or "exomecn.mops"?
There are no p values or FDR values in the output because cn.MOPS is an unsupervised technique. However, you can filter CNVs based on their mean or median I/NI call. Check the columns "mean" and "median" of
If the problem persists, I can offer to have a look at your data and suggest a good parameter setting - you could send me your read count matrix via email.
Thanks a lot for your informative answer. I did use the "referencecn.mops" instead of the "exomecn.mops".
I checked the result of segmentation(resRef) and found the CN value of the gene missing in the result of cnvs(resRef). So I wonder what step you did to go from segmentation(resRef) to cnvs(resRef)? Did you simply apply some cutoff for the median/mean/CN? Because my gene of interest showed -0.24 ~ -0.39 from different replicates as median I/NI (hence CN2) in the segmentation results, it may be filtered in the cnvs results? I can see that in other regions along the genome there are also a few CN2 in the cnvs results.
The thresholds "upperThreshold" and "lowerThreshold" for CNV gains and CNV losses, respectively, are crucial hyperparameters that govern precision and recall (sensitivity and specificity) of the method. These hyperparameters should actually be adjusted on separate training data sets on which CNVs are known. Most people do not have training data sets, so I have set these values based on a large number of diverse training data sets.
You are in the lucky position to know (at least some) of the CNVs in you data. Hence you can adjust the thresholds/hyperparameters to your needs (trade-off between precision and recall). If you produce more data in the future, you can use these settings that you have determined.
Now answering your questions:
a) Yes, these are simple cut-offs for the mean I/NI values.
b) Yes, that's exactly what happened in your case (the CNVs got filtered out).
c) In the slot "segmentation(resultCNMOPS)" there should be large regions with CN2! This is the full segmentation of the genome, so the large segments should have CN2.