limma Intercept vs No-intercept models completely changing DMR results?
2
0
Entering edit mode
Kim • 0
@27710395
Last seen 4 weeks ago
United States

Hi Everyone!

Im currently working on a basic DMP/DMR analysis using EPICV2 array data, and was running into an interesting issue in regards to specifying my model design. I am getting VERY different results between models with and without an intercept. Since my variable of interest is a factor, the results should be the same, right?

My variable of interest is a factor with two levels ("AltSHP_LA" and "AltSHP_T_HA") and I have several covariates that are both factors and continuous (ex. Sex, PC1 (Blood Cell types), and Age.

Ive read some similar questions that have been previously asked and answered, but I cant seem to find a solution from them that solve this exact issue. Any help would be greatly appreciated!

# Define the factor of interest and covariates
Alt <- factor(targets_Sherpa$Group_Code) Sex <- factor(targets_Sherpa$Sex)
Age <- as.numeric(targets_Sherpa$Age) PC1 <- as.numeric(targets_Sherpa$PC1)

# Define Intercept Model
design_intercept <- model.matrix(~Alt+Sex+Age+PC1, data=targets_Sherpa)
colnames(design_intercept) <- c("AltSHP_LA","AltSHP_T_HA", "Sex", "Age", "PC1")
contMatrix_intercept <- makeContrasts(AltSHP_LA-AltSHP_T_HA,
levels=design_intercept)
# Define no-Intercept Model
design_Nointercept <- model.matrix(~0+Alt+Sex+Age+PC1, data=targets_Sherpa)
contMatrix_Nointercept <- makeContrasts(AltSHP_LA-AltSHP_T_HA,
levels=design_Nointercept)

# Function for retreiving DMPs to ensure Im running things consistently between models:
getDMP <- function(mSet, design, contMatrix, coef_num){
mVals2 <- getM(mSet)
fit <- lmFit(mVals2, design)
fit <- contrasts.fit(fit, contMatrix)
fit <- eBayes(fit, trend=TRUE)
topTable(fit,coef=coef_num)
DMPs <- topTable(fit,coef=coef_num)
DMPs}

# Intercept Model Results:
Intercept <- getDMP(mSetSqFlt_Sherpa2, design_intercept, contMatrix_intercept, 1)
cg21393171_TC21  6.694837  1.858226  36.37520 6.392776e-25 5.770445e-19
cg11049634_BC11  8.315347  2.423556  35.38914 1.347332e-24 6.080854e-19
cg08560117_BC21 -7.994720 -1.825273 -34.69716 2.300907e-24 6.923052e-19
cg27604818_BC21 -7.719519 -1.969742 -32.76409 1.085070e-23 2.448599e-18
cg26207017_TC21 -7.219652 -2.106884 -32.07883 1.920206e-23 3.461483e-18

# No Intercept Model Results:
NoIntercept <- getDMP(mSetSqFlt_Sherpa2, design_Nointercept, contMatrix_Nointercept, 1)
cg02971555_BC21   -0.5815881  2.6954319 -5.921211 2.361404e-06 0.9758965
cg21532288_TC21    0.3920665  3.2796548  5.634846 5.100760e-06 0.9758965
cg03311185_BC21    0.3433009  2.9640623  5.602717 8.200960e-06 0.9758965
cg04490945_TC21    0.4260654  0.5504043  5.459177 8.707104e-06 0.9758965


As you can see, the intercept and non-intercept models really change results drastically. Any idea whats going on?

Thanks in advance, any help would be appreciated!

sessionInfo( )
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /u/local/compilers/intel/oneapi/2022.1.1/mkl/2022.0.1/lib/intel64/libmkl_gf_lp64.so.2;  LAPACK version 3.9.0

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] grid      parallel  stats4    stats     graphics  grDevices utils
[8] datasets  methods   base

other attached packages:
[1] ewastools_1.7.2
[2] IlluminaHumanMethylationEPICv2manifest_0.99.2
[3] IlluminaHumanMethylationEPICv2anno.20a1.hg38_0.99.1
[4] methylCC_1.16.0
[5] FlowSorted.Blood.450k_1.40.0
[6] remotes_2.5.0
[7] RColorBrewer_1.1-3
[8] DMRcatedata_2.20.3
[9] ExperimentHub_2.10.0
[10] AnnotationHub_3.10.1
[11] BiocFileCache_2.10.2
[12] dbplyr_2.5.0
[13] mCSEA_1.22.0
[14] Homo.sapiens_1.3.1
[15] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[16] org.Hs.eg.db_3.18.0
[17] GO.db_3.18.0
[18] OrganismDbi_1.44.0
[19] GenomicFeatures_1.54.4
[20] AnnotationDbi_1.64.1
[22] DMRcate_2.99.6
[23] Gviz_1.46.1
[24] minfiData_0.48.0
[25] IlluminaHumanMethylation450kmanifest_0.4.0
[26] missMethyl_1.36.0
[27] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
[28] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1
[29] minfi_1.48.0
[30] bumphunter_1.44.0
[31] locfit_1.5-9.9
[32] iterators_1.0.14
[33] foreach_1.5.2
[34] Biostrings_2.70.3
[35] XVector_0.42.0
[36] SummarizedExperiment_1.32.0
[37] Biobase_2.62.0
[38] MatrixGenerics_1.14.0
[39] matrixStats_1.2.0
[40] GenomicRanges_1.54.1
[41] GenomeInfoDb_1.38.8
[42] IRanges_2.36.0
[43] S4Vectors_0.40.2
[44] BiocGenerics_0.48.1
[45] limma_3.58.1
[46] lubridate_1.9.3
[47] forcats_1.0.0
[48] stringr_1.5.1
[49] dplyr_1.1.4
[50] purrr_1.0.2
[52] tidyr_1.3.1
[53] tibble_3.2.1
[54] ggplot2_3.5.1
[55] tidyverse_2.0.0
[56] BiocManager_1.30.22

loaded via a namespace (and not attached):
[1] ProtGenerics_1.34.0           bitops_1.0-7
[3] httr_1.4.7                    tools_4.3.0
[5] doRNG_1.8.6                   backports_1.4.1
[7] utf8_1.2.4                    R6_2.5.1
[9] HDF5Array_1.30.1              lazyeval_0.2.2
[11] rhdf5filters_1.14.1           permute_0.9-7
[13] withr_3.0.0                   prettyunits_1.2.0
[15] gridExtra_2.3                 base64_2.0.1
[17] preprocessCore_1.64.0         cli_3.6.2
[21] Rsamtools_2.18.0              foreign_0.8-84
[23] siggenes_1.76.0               illuminaio_0.44.0
[25] R.utils_2.12.3                dichromat_2.0-0.1
[27] scrime_1.3.5                  BSgenome_1.70.2
[31] RSQLite_2.3.6                 generics_0.1.3
[33] BiocIO_1.12.0                 gtools_3.9.5
[35] Matrix_1.6-5                  interp_1.1-6
[37] fansi_1.0.6                   abind_1.4-5
[39] R.methodsS3_1.8.2             lifecycle_1.0.4
[41] yaml_2.3.8                    edgeR_4.0.16
[43] rhdf5_2.46.1                  SparseArray_1.2.4
[45] blob_1.2.4                    promises_1.2.1
[47] crayon_1.5.2                  lattice_0.21-8
[49] annotate_1.80.0               KEGGREST_1.42.0
[51] pillar_1.9.0                  knitr_1.45
[53] beanplot_1.3.1                rjson_0.2.21
[55] codetools_0.2-19              glue_1.7.0
[57] data.table_1.15.4             vctrs_0.6.5
[59] png_0.1-8                     cellranger_1.1.0
[61] gtable_0.3.4                  cachem_1.0.8
[63] xfun_0.43                     S4Arrays_1.2.1
[65] mime_0.12                     survival_3.5-5
[67] statmod_1.5.0                 interactiveDisplayBase_1.40.0
[69] ellipsis_0.3.2                nlme_3.1-162
[71] bit64_4.0.5                   bsseq_1.38.0
[73] progress_1.2.3                filelock_1.0.3
[75] nor1mix_1.3-3                 rpart_4.1.19
[77] colorspace_2.1-0              DBI_1.2.2
[79] Hmisc_5.1-2                   nnet_7.3-18
[81] tidyselect_1.2.1              bit_4.0.5
[83] compiler_4.3.0                curl_5.2.1
[85] graph_1.80.0                  htmlTable_2.4.2
[87] xml2_1.3.6                    DelayedArray_0.28.0
[89] rtracklayer_1.62.0            checkmate_2.3.1
[93] RBGL_1.78.0                   rappdirs_0.3.3
[95] digest_0.6.35                 rmarkdown_2.26
[97] GEOquery_2.70.0               htmltools_0.5.7
[99] pkgconfig_2.0.3               jpeg_0.1-10
[101] base64enc_0.1-3               sparseMatrixStats_1.14.0
[103] fastmap_1.1.1                 ensembldb_2.26.0
[105] rlang_1.1.3                   htmlwidgets_1.6.4
[107] shiny_1.8.0                   DelayedMatrixStats_1.24.0
[109] BiocParallel_1.36.0           mclust_6.1.1
[111] R.oo_1.26.0                   VariantAnnotation_1.48.1
[113] RCurl_1.98-1.14               magrittr_2.0.3
[115] Formula_1.2-5                 GenomeInfoDbData_1.2.11
[117] Rhdf5lib_1.24.2               munsell_0.5.0
[119] Rcpp_1.0.12                   stringi_1.8.3
[121] zlibbioc_1.48.2               MASS_7.3-58.4
[123] plyr_1.8.9                    deldir_2.0-2
[125] splines_4.3.0                 multtest_2.58.0
[127] hms_1.1.3                     rngtools_1.5.2
[129] biomaRt_2.58.2                BiocVersion_3.18.1
[131] XML_3.99-0.16.1               evaluate_0.23
[133] latticeExtra_0.6-30           biovizBase_1.50.0
[135] tzdb_0.4.0                    httpuv_1.6.14
[137] openssl_2.1.1                 reshape_0.8.9
[139] xtable_1.8-4                  restfulr_0.0.15
[141] AnnotationFilter_1.26.0       later_1.3.2
[143] plyranges_1.22.0              memoise_2.0.1
[145] GenomicAlignments_1.38.2      cluster_2.1.4
[147] timechange_0.3.0

limma DMRcate MethylationArray EPICV2 • 519 views
2
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

This part:

design_intercept <- model.matrix(~Alt+Sex+Age+PC1, data=targets_Sherpa)
colnames(design_intercept) <- c("AltSHP_LA","AltSHP_T_HA", "Sex", "Age", "PC1")


Is wrong. The intercept (which you are renaming) is AltSHP_LA, but the second coefficient is absolutely not AltSHP_T_HA. It is AltSHP_T_HA - AltSHP_LA, and is itself already the contrast you are testing for. For that model you don't use contrasts.fit, but instead go straight to eBayes and then extract the second coefficient with topTable.

1
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

I'll add a little bit of general advice to James' answer. You seem to have the misunderstanding that you can change the design matrix but keep all the other code unchanged. The contrasts are computed from the coefficients and naturally must change if you change the meaning of the coefficients, which is what are you doing when you change the design matrix. As James points out, you don't need contrasts at all in this case with the intercept model. You should never be renaming the columns of the design matrix if you use an intercept model.