Hello,
I am using Methylkit for the analysis of bisulfite experiment in amplicons.
I am trying to use the normalizeCoverage function, but it transforms the data to NAs.
Here is the code:
myobj=methRead(file.list,
sample.id=as.list(paste(as.character(A_B_samples$Sample.2))),
assembly="Amplicons",
treatment=treated_vec,
context="CpG"
)
#filter low coverage
filtered.myobj <- filterByCoverage(myobj,lo.count=50,lo.perc=NULL,hi.perc=NULL)
filtered.myobj.norm <- normalizeCoverage(filtered.myobj)
After the normalizeCoverage step the data is transformed to NAs.
If that would have worked, the next step would be:
meth=methylKit::unite(filtered.myobj.norm, destrand=FALSE,min.per.group=3L)
myDiff=calculateDiffMeth(meth)
Without the normalizeCoverage step the differential methylation works OK.
Can you please help with this?
Do you recommend normalizing the covergae for data from amplicons?
Thanks
---------------------------------------
sessionInfo()
R version 3.5.0 Patched (2018-06-04 r74846)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 methylKit_1.6.1 tidyr_0.8.1 tibble_1.4.2 dplyr_0.7.6 GenomicRanges_1.32.6
[7] GenomeInfoDb_1.16.0 IRanges_2.14.11 S4Vectors_0.18.3 BiocGenerics_0.26.0
loaded via a namespace (and not attached):
[1] Biobase_2.40.0 splines_3.5.0 R.utils_2.7.0 gtools_3.8.1
[5] assertthat_0.2.0 GenomeInfoDbData_1.1.0 Rsamtools_1.32.3 numDeriv_2016.8-1
[9] pillar_1.3.0 lattice_0.20-35 glue_1.3.0 limma_3.36.3
[13] bbmle_1.0.20 XVector_0.20.0 qvalue_2.12.0 colorspace_1.3-2
[17] Matrix_1.2-14 R.oo_1.22.0 plyr_1.8.4 XML_3.98-1.16
[21] pkgconfig_2.0.2 emdbook_1.3.10 zlibbioc_1.26.0 purrr_0.2.5
[25] scales_1.0.0 BiocParallel_1.14.2 ggplot2_3.0.0 SummarizedExperiment_1.10.1
[29] lazyeval_0.2.1 magrittr_1.5 crayon_1.3.4 mclust_5.4.1
[33] R.methodsS3_1.7.1 MASS_7.3-50 tools_3.5.0 data.table_1.11.4
[37] matrixStats_0.54.0 stringr_1.3.1 munsell_0.5.0 DelayedArray_0.6.5
[41] Biostrings_2.48.0 compiler_3.5.0 fastseg_1.26.0 rlang_0.2.2
[45] grid_3.5.0 RCurl_1.95-4.11 bitops_1.0-6 gtable_0.2.0
[49] reshape2_1.4.3 R6_2.2.2 GenomicAlignments_1.16.0 rtracklayer_1.40.6
[53] bindr_0.1.1 stringi_1.1.7 Rcpp_0.12.18 tidyselect_0.2.4
[57] coda_0.19-1
Thank you very much for your answer.
I now understand the reason for the problem: since this is short amplicon, and the coverage is quite even in all the CpGs in the amplicon, there are several samples that have no data at all CpGs after the filterByCoverage:
filtered.myobj <- filterByCoverage(myobj,lo.count=50,lo.perc=NULL,hi.perc=NULL)
I think that at that stage the list should be filtered from such samples that do not have data at all.
Is it OK, just to delete these cases manually and update the treatment of the list, or is there a methylkit function that does this?
Then the next steps will be:
normalize the coverage:
filtered.myobj.norm <- normalizeCoverage(filtered.myobj)
merge all samples to one object for base-pair locations that are covered in all samples:
meth=methylKit::unite(filtered.myobj.norm, destrand=FALSE,min.per.group=3L)
and differential methylation:
myDiff=calculateDiffMeth(meth,weighted.mean = T, adjust = "hochberg")
Is this workflow OK?
Do you recommend normalizing the coverage when working with short amplicons? Ideally, I would like to give higher weight for methylation, when the evidence is based on higher coverage.
Thanks for all the help.