Determination of Hypo- or Hyper- methylation
1
0
Entering edit mode
@bruno-verstraeten-15923
Last seen 4.7 years ago

Hello,

I have a question about the results of the drmseq package. According to the guide (https://bioconductor.org/packages/devel/bioc/vignettes/dmrseq/inst/doc/dmrseq.html), a region is hypo or hypermethylated according to the sign of its corresponding test statistic and the alphabetical order of the covariate of interest. I reproduced the example results of the guide with the following code:

library(dmrseq)
library(bsseq)

bs <- BS.chr21

testCovariate <- "CellType"
regions <- dmrseq(bs=bs[240001:260000,],
cutoff = 0.05,
testCovariate=testCovariate)
sigRegions <- regions[regions$qval < 0.05,] ​However, the direction of the effect does not seem to be in line with the methylation levels calculated per region per sample. If one significant region is taken as an example: > sigRegions[1,] GRanges object with 1 range and 7 metadata columns: seqnames ranges strand | L area <Rle> <IRanges> <Rle> | <integer> <numeric> 171 chr21 43605625-43606688 * | 24 12.7309412794983 beta stat pval qval <numeric> <numeric> <numeric> <numeric> 171 -1.2434551058172 -18.5650464920496 0.00307692307692308 0.025982905982906 index <IRanges> 171 845-868 The test statistic is negative so the condition that comes first in alphabetical order is hypomethylated compared to the other condition. The conditions are "h1" and "imr90" so h1 is hypomethylated compared to imr90. > pData(bs) DataFrame with 4 rows and 2 columns CellType Rep <character> <character> imr90.r1 imr90 replicate1 imr90.r2 imr90 replicate2 h1.r1 h1 replicate1 h1.r2 h1 replicate2 Yet, when the methylation estimates are calculated, the h1 samples show a higher methylation estimate than the imr90 samples, indicating hypermethylation. > getMeth(bs,regions = sigRegions,what = "perRegion",type = "raw")[1,] <4> DelayedArray object of type "double": imr90.r1 imr90.r2 h1.r1 h1.r2 0.3817489 0.4050773 0.9252300 0.8773829  Is there something i misunderstood about these apparent conflicting results? regards, Bruno sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS Matrix products: default BLAS: /home/bsverstr/R-3.5.0/lib/libRblas.so LAPACK: /home/bsverstr/R-3.5.0/lib/libRlapack.so locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] dmrseq_1.0.0 bsseq_1.16.0 [3] SummarizedExperiment_1.10.0 DelayedArray_0.6.0 [5] BiocParallel_1.14.0 matrixStats_0.53.1 [7] Biobase_2.40.0 GenomicRanges_1.32.0 [9] GenomeInfoDb_1.16.0 IRanges_2.14.1 [11] S4Vectors_0.18.1 BiocGenerics_0.26.0 loaded via a namespace (and not attached): [1] nlme_3.1-137 bitops_1.0-6 [3] bit64_0.9-7 RColorBrewer_1.1-2 [5] progress_1.1.2 httr_1.3.1 [7] doRNG_1.6.6 tools_3.5.0 [9] R6_2.2.2 HDF5Array_1.8.0 [11] DBI_1.0.0 lazyeval_0.2.1 [13] colorspace_1.3-2 permute_0.9-4 [15] prettyunits_1.0.2 bit_1.1-12 [17] compiler_3.5.0 pkgmaker_0.22 [19] rtracklayer_1.40.0 scales_0.5.0 [21] readr_1.1.1 stringr_1.3.0 [23] digest_0.6.15 Rsamtools_1.32.0 [25] R.utils_2.6.0 XVector_0.20.0 [27] pkgconfig_2.0.1 htmltools_0.3.6 [29] limma_3.36.0 BSgenome_1.48.0 [31] regioneR_1.12.0 rlang_0.2.0 [33] RSQLite_2.1.0 BiocInstaller_1.30.0 [35] shiny_1.0.5 DelayedMatrixStats_1.2.0 [37] bindr_0.1.1 gtools_3.5.0 [39] dplyr_0.7.4 R.oo_1.22.0 [41] RCurl_1.95-4.10 magrittr_1.5 [43] GenomeInfoDbData_1.1.0 Matrix_1.2-14 [45] Rcpp_0.12.16 munsell_0.4.3 [47] Rhdf5lib_1.2.0 R.methodsS3_1.7.1 [49] stringi_1.2.2 yaml_2.1.19 [51] zlibbioc_1.26.0 rhdf5_2.24.0 [53] plyr_1.8.4 bumphunter_1.22.0 [55] AnnotationHub_2.12.0 grid_3.5.0 [57] blob_1.1.1 promises_1.0.1 [59] lattice_0.20-35 splines_3.5.0 [61] Biostrings_2.48.0 GenomicFeatures_1.32.0 [63] hms_0.4.2 locfit_1.5-9.1 [65] pillar_1.2.2 rngtools_1.2.4 [67] codetools_0.2-15 reshape2_1.4.3 [69] biomaRt_2.36.0 XML_3.98-1.11 [71] glue_1.2.0 outliers_0.14 [73] annotatr_1.6.0 data.table_1.11.0 [75] foreach_1.4.4 httpuv_1.4.2 [77] gtable_0.2.0 assertthat_0.2.0 [79] ggplot2_2.2.1 mime_0.5 [81] xtable_1.8-2 later_0.7.2 [83] tibble_1.4.2 iterators_1.0.9 [85] registry_0.5 GenomicAlignments_1.16.0 [87] AnnotationDbi_1.42.0 memoise_1.1.0 [89] bindrcpp_0.2.2 interactiveDisplayBase_1.18.0 dmrseq bsseq • 702 views ADD COMMENT 0 Entering edit mode @james-w-macdonald-5106 Last seen 2 days ago United States You misunderstand how models are fit in R, when using the default of treatment contrasts. As an example: > z <- factor(rep(c("imr90","h1"), each = 4)) > z [1] imr90 imr90 imr90 imr90 h1 h1 h1 h1 Levels: h1 imr90 > model.matrix(~z) (Intercept) zimr90 1 1 1 2 1 1 3 1 1 4 1 1 5 1 0 6 1 0 7 1 0 8 1 0 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$z
[1] "contr.treatment"

So the beta from that model will be imr90 - h1, since h1 is the 'baseline' level.

0
Entering edit mode

0
Entering edit mode

Sorry for the late reply - I see your confusion, Bruno. The dmrseq vignette correctly stated that the reference category was determined alphabetically, but there was a typo in the explanation of "hypo" and "hyper" methylation in the example. Yes, as you observed and as James demonstrates above, the condition label with higher alphabetical rank will become the reference category. In this example, "h1" is the reference, so a negative effect size means that "imr90" is hypomethylated relative to "h1".

I have fixed the typo in the vignette; thanks for catching the error!