Question: normalizecoverage in methylkit outputs NAs
0
gravatar for GFM
13 months ago by
GFM20
European Union
GFM20 wrote:

Hello,
I am using Methylkit for the analysis of bisulfite experiment in amplicons.
I am trying to use the normalizeCoverage function, but it transforms the data to NAs.
Here is the code:

myobj=methRead(file.list,
                   sample.id=as.list(paste(as.character(A_B_samples$Sample.2))),
                   assembly="Amplicons",
                   treatment=treated_vec,
                   context="CpG"
  )
 
  #filter low coverage
  filtered.myobj <- filterByCoverage(myobj,lo.count=50,lo.perc=NULL,hi.perc=NULL)
  filtered.myobj.norm <- normalizeCoverage(filtered.myobj)

After the normalizeCoverage step the data is transformed to NAs.

If that would have worked, the next step would be:
meth=methylKit::unite(filtered.myobj.norm, destrand=FALSE,min.per.group=3L)
myDiff=calculateDiffMeth(meth)

Without the normalizeCoverage step the differential methylation works OK.

Can you please help with this?

Do you recommend normalizing the covergae for data from amplicons?

Thanks

---------------------------------------
 sessionInfo()
R version 3.5.0 Patched (2018-06-04 r74846)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
  [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
  [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] bindrcpp_0.2.2       methylKit_1.6.1      tidyr_0.8.1          tibble_1.4.2         dplyr_0.7.6          GenomicRanges_1.32.6
[7] GenomeInfoDb_1.16.0  IRanges_2.14.11      S4Vectors_0.18.3     BiocGenerics_0.26.0

loaded via a namespace (and not attached):
  [1] Biobase_2.40.0              splines_3.5.0               R.utils_2.7.0               gtools_3.8.1               
[5] assertthat_0.2.0            GenomeInfoDbData_1.1.0      Rsamtools_1.32.3            numDeriv_2016.8-1          
[9] pillar_1.3.0                lattice_0.20-35             glue_1.3.0                  limma_3.36.3               
[13] bbmle_1.0.20                XVector_0.20.0              qvalue_2.12.0               colorspace_1.3-2           
[17] Matrix_1.2-14               R.oo_1.22.0                 plyr_1.8.4                  XML_3.98-1.16              
[21] pkgconfig_2.0.2             emdbook_1.3.10              zlibbioc_1.26.0             purrr_0.2.5                
[25] scales_1.0.0                BiocParallel_1.14.2         ggplot2_3.0.0               SummarizedExperiment_1.10.1
[29] lazyeval_0.2.1              magrittr_1.5                crayon_1.3.4                mclust_5.4.1               
[33] R.methodsS3_1.7.1           MASS_7.3-50                 tools_3.5.0                 data.table_1.11.4          
[37] matrixStats_0.54.0          stringr_1.3.1               munsell_0.5.0               DelayedArray_0.6.5         
[41] Biostrings_2.48.0           compiler_3.5.0              fastseg_1.26.0              rlang_0.2.2                
[45] grid_3.5.0                  RCurl_1.95-4.11             bitops_1.0-6                gtable_0.2.0               
[49] reshape2_1.4.3              R6_2.2.2                    GenomicAlignments_1.16.0    rtracklayer_1.40.6         
[53] bindr_0.1.1                 stringi_1.1.7               Rcpp_0.12.18                tidyselect_0.2.4           
[57] coda_0.19-1                
 

ADD COMMENTlink modified 13 months ago by altuna akalin20 • written 13 months ago by GFM20
Answer: normalizecoverage in methylkit outputs NAs
0
gravatar for altuna akalin
13 months ago by
Germany/Berlin/MDC-BIMSB
altuna akalin20 wrote:
Normalizing coverage shouldn’t lead to NAs. The thing that might lead to NAs is using ,min.per.group=3L In the unite function. Why this happens is explained in the docs, probably in the vignette and also in the discussion forum https://groups.google.com/forum/m/#!forum/methylkit_discussion If this doesn’t help, please send a reproducible example On Thu 25. Oct 2018 at 11:30, GFM [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User GFM <https: support.bioconductor.org="" u="" 8326=""/> wrote Question: > normalizecoverage in methylkit outputs NAs > <https: support.bioconductor.org="" p="" 114444=""/>: > > Hello, > I am using Methylkit for the analysis of bisulfite experiment in amplicons. > I am trying to use the normalizeCoverage function, but it transforms the > data to NAs. > Here is the code: > > myobj=methRead(file.list, > sample.id > =as.list(paste(as.character(A_B_samples$Sample.2))), > assembly="Amplicons", > treatment=treated_vec, > context="CpG" > ) > > #filter low coverage > filtered.myobj <- > filterByCoverage(myobj,lo.count=50,lo.perc=NULL,hi.perc=NULL) > filtered.myobj.norm <- normalizeCoverage(filtered.myobj) > > After the normalizeCoverage step the data is transformed to NAs. > > If that would have worked, the next step would be: > meth=methylKit::unite(filtered.myobj.norm, destrand=FALSE,min.per.group=3L) > myDiff=calculateDiffMeth(meth) > > Without the normalizeCoverage step the differential methylation works OK. > > Can you please help with this? > > Do you recommend normalizing the covergae for data from amplicons? > > Thanks > > --------------------------------------- > sessionInfo() > R version 3.5.0 Patched (2018-06-04 r74846) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows >= 8 x64 (build 9200) > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] parallel stats4 stats graphics grDevices utils > datasets methods base > > other attached packages: > [1] bindrcpp_0.2.2 methylKit_1.6.1 tidyr_0.8.1 > tibble_1.4.2 dplyr_0.7.6 GenomicRanges_1.32.6 > [7] GenomeInfoDb_1.16.0 IRanges_2.14.11 S4Vectors_0.18.3 > BiocGenerics_0.26.0 > > loaded via a namespace (and not attached): > [1] Biobase_2.40.0 splines_3.5.0 > R.utils_2.7.0 gtools_3.8.1 > [5] assertthat_0.2.0 GenomeInfoDbData_1.1.0 > Rsamtools_1.32.3 numDeriv_2016.8-1 > [9] pillar_1.3.0 lattice_0.20-35 > glue_1.3.0 limma_3.36.3 > [13] bbmle_1.0.20 XVector_0.20.0 > qvalue_2.12.0 colorspace_1.3-2 > [17] Matrix_1.2-14 R.oo_1.22.0 > plyr_1.8.4 XML_3.98-1.16 > [21] pkgconfig_2.0.2 emdbook_1.3.10 > zlibbioc_1.26.0 purrr_0.2.5 > [25] scales_1.0.0 BiocParallel_1.14.2 > ggplot2_3.0.0 SummarizedExperiment_1.10.1 > [29] lazyeval_0.2.1 magrittr_1.5 > crayon_1.3.4 mclust_5.4.1 > [33] R.methodsS3_1.7.1 MASS_7.3-50 > tools_3.5.0 data.table_1.11.4 > [37] matrixStats_0.54.0 stringr_1.3.1 > munsell_0.5.0 DelayedArray_0.6.5 > [41] Biostrings_2.48.0 compiler_3.5.0 > fastseg_1.26.0 rlang_0.2.2 > [45] grid_3.5.0 RCurl_1.95-4.11 > bitops_1.0-6 gtable_0.2.0 > [49] reshape2_1.4.3 R6_2.2.2 > GenomicAlignments_1.16.0 rtracklayer_1.40.6 > [53] bindr_0.1.1 stringi_1.1.7 > Rcpp_0.12.18 tidyselect_0.2.4 > [57] coda_0.19-1 > > > ------------------------------ > > Post tags: methylkit, normalizecoverage, NA > > You may reply via email or visit > normalizecoverage in methylkit outputs NAs > -- Sent from mobile, excuse the brevity
ADD COMMENTlink written 13 months ago by altuna akalin20

Thank you very much for your answer.

I now understand the reason for the problem: since this is short amplicon, and the coverage is quite even in all the CpGs in the amplicon, there are several samples that have no data at all CpGs after the filterByCoverage:

 filtered.myobj <- filterByCoverage(myobj,lo.count=50,lo.perc=NULL,hi.perc=NULL)

I think that at that stage the list should be filtered from such samples that do not have data at all.

Is it OK, just to delete these cases manually and update the treatment of the list, or is there a methylkit function that does this?

Then the next steps will be:

normalize the coverage:

filtered.myobj.norm <- normalizeCoverage(filtered.myobj)
 
  merge all samples to one object for base-pair locations that are covered in all samples:
   
    meth=methylKit::unite(filtered.myobj.norm, destrand=FALSE,min.per.group=3L)

and differential methylation:
 
 myDiff=calculateDiffMeth(meth,weighted.mean = T, adjust = "hochberg")

Is this workflow OK?

Do you recommend normalizing the coverage when working with short amplicons? Ideally, I would like to give higher weight for methylation, when the evidence is based on higher coverage.

Thanks for all the help.

ADD REPLYlink modified 12 months ago • written 12 months ago by GFM20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 279 users visited in the last hour