Question: Determination of Hypo- or Hyper- methylation
gravatar for Bruno Verstraeten
10 months ago by
Bruno Verstraeten0 wrote:


I have a question about the results of the drmseq package. According to the guide (, a region is hypo or hypermethylated according to the sign of its corresponding test statistic and the alphabetical order of the covariate of interest. I reproduced the example results of the guide with the following code:


bs <- BS.chr21

testCovariate <- "CellType"
regions <- dmrseq(bs=bs[240001:260000,],
                  cutoff = 0.05,
sigRegions <- regions[regions$qval < 0.05,]

​However, the direction of the effect does not seem to be in line with the methylation levels calculated per region per sample. If one significant region is taken as an example:

> sigRegions[1,]
GRanges object with 1 range and 7 metadata columns:
      seqnames            ranges strand |         L             area
         <Rle>         <IRanges>  <Rle> | <integer>        <numeric>
  171    chr21 43605625-43606688      * |        24 12.7309412794983
                  beta              stat                pval              qval
             <numeric>         <numeric>           <numeric>         <numeric>
  171 -1.2434551058172 -18.5650464920496 0.00307692307692308 0.025982905982906
  171   845-868

The test statistic is negative so the condition that comes first in alphabetical order is hypomethylated compared to the other condition. The conditions are "h1" and "imr90" so h1 is hypomethylated compared to imr90.

> pData(bs)
DataFrame with 4 rows and 2 columns
            CellType         Rep
         <character> <character>
imr90.r1       imr90  replicate1
imr90.r2       imr90  replicate2
h1.r1             h1  replicate1
h1.r2             h1  replicate2

Yet, when the methylation estimates are calculated, the h1 samples show a higher methylation estimate than the imr90 samples, indicating hypermethylation.

> getMeth(bs,regions = sigRegions,what = "perRegion",type = "raw")[1,]
<4> DelayedArray object of type "double":
imr90.r1  imr90.r2     h1.r1     h1.r2
0.3817489 0.4050773 0.9252300 0.8773829 

Is there something i misunderstood about these apparent conflicting results?



R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /home/bsverstr/R-3.5.0/lib/
LAPACK: /home/bsverstr/R-3.5.0/lib/

[1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C             
[3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8   
[7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                
[9] LC_ADDRESS=C               LC_TELEPHONE=C           

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] dmrseq_1.0.0                bsseq_1.16.0              
[3] SummarizedExperiment_1.10.0 DelayedArray_0.6.0        
[5] BiocParallel_1.14.0         matrixStats_0.53.1        
[7] Biobase_2.40.0              GenomicRanges_1.32.0      
[9] GenomeInfoDb_1.16.0         IRanges_2.14.1            
[11] S4Vectors_0.18.1            BiocGenerics_0.26.0        

loaded via a namespace (and not attached):
[1] nlme_3.1-137                  bitops_1.0-6                
[3] bit64_0.9-7                   RColorBrewer_1.1-2          
[5] progress_1.1.2                httr_1.3.1                  
[7] doRNG_1.6.6                   tools_3.5.0                 
[9] R6_2.2.2                      HDF5Array_1.8.0             
[11] DBI_1.0.0                     lazyeval_0.2.1              
[13] colorspace_1.3-2              permute_0.9-4               
[15] prettyunits_1.0.2             bit_1.1-12                  
[17] compiler_3.5.0                pkgmaker_0.22               
[19] rtracklayer_1.40.0            scales_0.5.0                
[21] readr_1.1.1                   stringr_1.3.0               
[23] digest_0.6.15                 Rsamtools_1.32.0            
[25] R.utils_2.6.0                 XVector_0.20.0              
[27] pkgconfig_2.0.1               htmltools_0.3.6             
[29] limma_3.36.0                  BSgenome_1.48.0             
[31] regioneR_1.12.0               rlang_0.2.0                 
[33] RSQLite_2.1.0                 BiocInstaller_1.30.0        
[35] shiny_1.0.5                   DelayedMatrixStats_1.2.0    
[37] bindr_0.1.1                   gtools_3.5.0                
[39] dplyr_0.7.4                   R.oo_1.22.0                 
[41] RCurl_1.95-4.10               magrittr_1.5                
[43] GenomeInfoDbData_1.1.0        Matrix_1.2-14               
[45] Rcpp_0.12.16                  munsell_0.4.3               
[47] Rhdf5lib_1.2.0                R.methodsS3_1.7.1           
[49] stringi_1.2.2                 yaml_2.1.19                 
[51] zlibbioc_1.26.0               rhdf5_2.24.0                
[53] plyr_1.8.4                    bumphunter_1.22.0           
[55] AnnotationHub_2.12.0          grid_3.5.0                  
[57] blob_1.1.1                    promises_1.0.1              
[59] lattice_0.20-35               splines_3.5.0               
[61] Biostrings_2.48.0             GenomicFeatures_1.32.0      
[63] hms_0.4.2                     locfit_1.5-9.1              
[65] pillar_1.2.2                  rngtools_1.2.4              
[67] codetools_0.2-15              reshape2_1.4.3              
[69] biomaRt_2.36.0                XML_3.98-1.11               
[71] glue_1.2.0                    outliers_0.14               
[73] annotatr_1.6.0                data.table_1.11.0           
[75] foreach_1.4.4                 httpuv_1.4.2                
[77] gtable_0.2.0                  assertthat_0.2.0            
[79] ggplot2_2.2.1                 mime_0.5                    
[81] xtable_1.8-2                  later_0.7.2                 
[83] tibble_1.4.2                  iterators_1.0.9             
[85] registry_0.5                  GenomicAlignments_1.16.0    
[87] AnnotationDbi_1.42.0          memoise_1.1.0               
[89] bindrcpp_0.2.2                interactiveDisplayBase_1.18.0
bsseq dmrseq • 220 views
ADD COMMENTlink modified 10 months ago by James W. MacDonald49k • written 10 months ago by Bruno Verstraeten0
Answer: Determination of Hypo- or Hyper- methylation
gravatar for James W. MacDonald
10 months ago by
United States
James W. MacDonald49k wrote:

You misunderstand how models are fit in R, when using the default of treatment contrasts. As an example:

> z <- factor(rep(c("imr90","h1"), each = 4))
> z
[1] imr90 imr90 imr90 imr90 h1    h1    h1    h1   
Levels: h1 imr90
> model.matrix(~z)
  (Intercept) zimr90
1           1      1
2           1      1
3           1      1
4           1      1
5           1      0
6           1      0
7           1      0
8           1      0
[1] 0 1
[1] "contr.treatment"

So the beta from that model will be imr90 - h1, since h1 is the 'baseline' level.

ADD COMMENTlink written 10 months ago by James W. MacDonald49k

Thank you for your answer.

ADD REPLYlink written 10 months ago by Bruno Verstraeten0

Sorry for the late reply - I see your confusion, Bruno. The dmrseq vignette correctly stated that the reference category was determined alphabetically, but there was a typo in the explanation of "hypo" and "hyper" methylation in the example. Yes, as you observed and as James demonstrates above, the condition label with higher alphabetical rank will become the reference category. In this example, "h1" is the reference, so a negative effect size means that "imr90" is hypomethylated relative to "h1".

I have fixed the typo in the vignette; thanks for catching the error!

ADD REPLYlink written 7 months ago by keegan50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 487 users visited in the last hour