Thank you for development of methylCC. It seems to be a nice package for deconvolution of cell types from whole genome bisulfite sequencing (WGBS) data for blood and great start for development. For reads annotated to human genome hg38 I needed to translated first my object obtained from dmrseq to hg19 before deploying the script using rtracklayer. During this procedure I lost around 165 000 CpGs out of 23 milions (as standard for dmrseq) or so but still I was able to run it.
Later, I noticed that all is sum up to 1 assuming the presence of 5 type cells in mixture. Can we implement "unknown cell type" to the calculation of the proportion of cell types? I apply this algorithm to serum blood from a biobank and I do expect to have many of these blood cell types (as a contamination of serum with blood cells during separation) but also maybe some other contamination. Therefore I would expect to have relative proportion of these 5 know cell type and rest that would be unknown.
I am bit skeptical for reference free methods but maybe this could be also tried. Any suggestions? What is the state-of-the art there?
Finally, what worries me a bit is that dmrseq (bs) object has different coverage for individual sites. Dmrseq takes this into account that for finding dmrs giving a lower weight for CpGs that have lower coverage, but not sure how/if this was here implemented for methylCC? With RRBS coverage tends to be bigger and often people use cutoff 10x in WGBS these numbers starts from 1.
All the best, Marcin
and the code for interested that could be implemented as if statment in methylCC for those who are even not aware of the problem ''' *library(rtracklayer) path <- system.file(package="liftOver", "extdata", "hg38ToHg19.over.chain") ch <- import.chain(path) seqlevelsStyle(bs.filtered) = "UCSC" bs.filteredlifted <- liftOver(bs.filtered, ch) bs.filteredliftedunlisted <- unlist(bs.filteredlifted) #22619314 genome(bs.filteredliftedunlisted) <- "hg19"
bs.filtered.test <- bs.filtered bs.filtered.test@rowRanges <- bs.filteredliftedunlisted length(bs.filtered.original) - length(bs.filteredliftedunlisted) # 165884 est <- estimatecc(object = bs.filtered.test, includecpgs=TRUE, includedmrs=TRUE) cell_counts(est)* '''