Dear cn.mops crew,
I am trying cn.mops on a set of yeast genomes. Some issues:
- sometimes some strain lacks completely a certain chromosome, hence reads counts will be zero for that chromosome. Of course I can deal with this with preprocessing, but it would be very nice if cn.MOPS would not give an error but rather do something smarter
- I have a guess based on lab work on ploidy level of the samples. They vary from 4n to 2n. It would be very nice if this information could be used in the CNV counting instead of just giving a single ploidy level for all the samples.
Would you have recommendations on settings to use with this kind of samples? Especially the normalisation step is prone to fail with the whole chromosome losses I am seeing.
Hello Mikko,
I have put these normalization functions that one of my students developed on http://www.bioinf.jku.at/software/cnmops/ (see "Additional normalization functions"). These functions take into account different ploidys or large CNVs for normalization. I hope the documentation in the code is sufficient. Otherwise please contact us again!
Regards,
Günter
Hi Günter,
I am also examining individuals which contain whole-chromosome dosage polymorphisms. Many of these individuals are also of varying sequencing coverage. I am not having success using "normalizeChromosomes" (assigning their relative ploidy estimates), since a few individuals which I know to have extra or fewer chromosomes are not showing the expected increased/decreased log values for the respective chromosomes (via segPlot).
Would it be best to use the normalizeTumor functions in this case?
If so, I'm a little confused as to how to set it up. The first function, toGR, takes as an input a data frame of chromosomes and their copy numbers, correct? I see there isn't a "sample" column, so does this function need to be performed for each sample.bam file? Should this data frame include the estimated copy number of EVERY chromosome in each sample? If so, how am I supposed to know the estimated copy number, isn't that supposed to be the output of cn.mops? Alternatively, Would I need to just provide a few chromosomes I suspect to be 2N to act as a baseline?
Thanks,
Mike
Hi Mike,
before you can use normalizeTumor() it is necessary to estimate the copy number for large segments and the purity of your data. One way to get this estimates would be to use PyLOH: https://github.com/uci-cbcl/PyLOH or my improved version: https://github.com/patrick-praher/PyLOH_Opt
Based on the CN estimates normalizeTumor takes the segments with an estimated CN of 2 as basis for the normalization. Unfortunately http://www.bioinf.jku.at/software/cnmops/code/normalizeTumorSample.R does not show that the variable cnvs_CCL is read from PyLOH results file. If you want to I can provide you with a function to read the PyLOH results into R.
Based on the purity estimates the fold change vector is corrected, before cn.mops can be applied as shown in normalizeTumorSample.R
I hope this short explanation was helpful. Please feel free to ask if you need anything.
Cheers,