Question

Determination of Hypo- or Hyper- methylation

0

Entering edit mode

Bruno Verstraeten • 0

@bruno-verstraeten-15923

Last seen 6.2 years ago

Hello,

I have a question about the results of the drmseq package. According to the guide (https://bioconductor.org/packages/devel/bioc/vignettes/dmrseq/inst/doc/dmrseq.html), a region is hypo or hypermethylated according to the sign of its corresponding test statistic and the alphabetical order of the covariate of interest. I reproduced the example results of the guide with the following code:

library(dmrseq)
library(bsseq)

bs <- BS.chr21

testCovariate <- "CellType"
regions <- dmrseq(bs=bs[240001:260000,],
                  cutoff = 0.05,
                  testCovariate=testCovariate)
sigRegions <- regions[regions$qval < 0.05,]

However, the direction of the effect does not seem to be in line with the methylation levels calculated per region per sample. If one significant region is taken as an example:

> sigRegions[1,]
GRanges object with 1 range and 7 metadata columns:
      seqnames            ranges strand |         L             area
         <Rle>         <IRanges>  <Rle> | <integer>        <numeric>
  171    chr21 43605625-43606688      * |        24 12.7309412794983
                  beta              stat                pval              qval
             <numeric>         <numeric>           <numeric>         <numeric>
  171 -1.2434551058172 -18.5650464920496 0.00307692307692308 0.025982905982906
          index
      <IRanges>
  171   845-868

The test statistic is negative so the condition that comes first in alphabetical order is hypomethylated compared to the other condition. The conditions are "h1" and "imr90" so h1 is hypomethylated compared to imr90.

> pData(bs)
DataFrame with 4 rows and 2 columns
            CellType         Rep
         <character> <character>
imr90.r1       imr90  replicate1
imr90.r2       imr90  replicate2
h1.r1             h1  replicate1
h1.r2             h1  replicate2

Yet, when the methylation estimates are calculated, the h1 samples show a higher methylation estimate than the imr90 samples, indicating hypermethylation.

> getMeth(bs,regions = sigRegions,what = "perRegion",type = "raw")[1,]
<4> DelayedArray object of type "double":
imr90.r1  imr90.r2     h1.r1     h1.r2
0.3817489 0.4050773 0.9252300 0.8773829

Is there something i misunderstood about these apparent conflicting results?

regards,

Bruno

sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /home/bsverstr/R-3.5.0/lib/libRblas.so
LAPACK: /home/bsverstr/R-3.5.0/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C             
[3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8   
[5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8  
[7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                
[9] LC_ADDRESS=C               LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] dmrseq_1.0.0                bsseq_1.16.0              
[3] SummarizedExperiment_1.10.0 DelayedArray_0.6.0        
[5] BiocParallel_1.14.0         matrixStats_0.53.1        
[7] Biobase_2.40.0              GenomicRanges_1.32.0      
[9] GenomeInfoDb_1.16.0         IRanges_2.14.1            
[11] S4Vectors_0.18.1            BiocGenerics_0.26.0        

loaded via a namespace (and not attached):
[1] nlme_3.1-137                  bitops_1.0-6                
[3] bit64_0.9-7                   RColorBrewer_1.1-2          
[5] progress_1.1.2                httr_1.3.1                  
[7] doRNG_1.6.6                   tools_3.5.0                 
[9] R6_2.2.2                      HDF5Array_1.8.0             
[11] DBI_1.0.0                     lazyeval_0.2.1              
[13] colorspace_1.3-2              permute_0.9-4               
[15] prettyunits_1.0.2             bit_1.1-12                  
[17] compiler_3.5.0                pkgmaker_0.22               
[19] rtracklayer_1.40.0            scales_0.5.0                
[21] readr_1.1.1                   stringr_1.3.0               
[23] digest_0.6.15                 Rsamtools_1.32.0            
[25] R.utils_2.6.0                 XVector_0.20.0              
[27] pkgconfig_2.0.1               htmltools_0.3.6             
[29] limma_3.36.0                  BSgenome_1.48.0             
[31] regioneR_1.12.0               rlang_0.2.0                 
[33] RSQLite_2.1.0                 BiocInstaller_1.30.0        
[35] shiny_1.0.5                   DelayedMatrixStats_1.2.0    
[37] bindr_0.1.1                   gtools_3.5.0                
[39] dplyr_0.7.4                   R.oo_1.22.0                 
[41] RCurl_1.95-4.10               magrittr_1.5                
[43] GenomeInfoDbData_1.1.0        Matrix_1.2-14               
[45] Rcpp_0.12.16                  munsell_0.4.3               
[47] Rhdf5lib_1.2.0                R.methodsS3_1.7.1           
[49] stringi_1.2.2                 yaml_2.1.19                 
[51] zlibbioc_1.26.0               rhdf5_2.24.0                
[53] plyr_1.8.4                    bumphunter_1.22.0           
[55] AnnotationHub_2.12.0          grid_3.5.0                  
[57] blob_1.1.1                    promises_1.0.1              
[59] lattice_0.20-35               splines_3.5.0               
[61] Biostrings_2.48.0             GenomicFeatures_1.32.0      
[63] hms_0.4.2                     locfit_1.5-9.1              
[65] pillar_1.2.2                  rngtools_1.2.4              
[67] codetools_0.2-15              reshape2_1.4.3              
[69] biomaRt_2.36.0                XML_3.98-1.11               
[71] glue_1.2.0                    outliers_0.14               
[73] annotatr_1.6.0                data.table_1.11.0           
[75] foreach_1.4.4                 httpuv_1.4.2                
[77] gtable_0.2.0                  assertthat_0.2.0            
[79] ggplot2_2.2.1                 mime_0.5                    
[81] xtable_1.8-2                  later_0.7.2                 
[83] tibble_1.4.2                  iterators_1.0.9             
[85] registry_0.5                  GenomicAlignments_1.16.0    
[87] AnnotationDbi_1.42.0          memoise_1.1.0               
[89] bindrcpp_0.2.2                interactiveDisplayBase_1.18.0

dmrseq bsseq • 1.3k views

ADD COMMENT • link updated 6.2 years ago by James W. MacDonald 66k • written 6.2 years ago by Bruno Verstraeten • 0

score 0 · Answer 1 · 2018-05-25

0

Entering edit mode

James W. MacDonald 66k

@james-w-macdonald-5106

Last seen 5 hours ago

United States

You misunderstand how models are fit in R, when using the default of treatment contrasts. As an example:

> z <- factor(rep(c("imr90","h1"), each = 4))
> z
[1] imr90 imr90 imr90 imr90 h1    h1    h1    h1   
Levels: h1 imr90
> model.matrix(~z)
  (Intercept) zimr90
1           1      1
2           1      1
3           1      1
4           1      1
5           1      0
6           1      0
7           1      0
8           1      0
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$z
[1] "contr.treatment"

So the beta from that model will be imr90 - h1, since h1 is the 'baseline' level.

ADD COMMENT • link 6.2 years ago James W. MacDonald 66k

0

Entering edit mode

Thank you for your answer.

ADD REPLY • link 6.2 years ago Bruno Verstraeten • 0

0

Entering edit mode

Sorry for the late reply - I see your confusion, Bruno. The dmrseq vignette correctly stated that the reference category was determined alphabetically, but there was a typo in the explanation of "hypo" and "hyper" methylation in the example. Yes, as you observed and as James demonstrates above, the condition label with higher alphabetical rank will become the reference category. In this example, "h1" is the reference, so a negative effect size means that "imr90" is hypomethylated relative to "h1".

I have fixed the typo in the vignette; thanks for catching the error!

ADD REPLY • link 6.0 years ago keegan ▴ 60