minfi, bumphunter + complex experimental design, and 3 other 450K queries
2
1
Entering edit mode
Guido Hooiveld ★ 2.9k
@guido-hooiveld-2020
Last seen 3 hours ago
Wageningen University, Wageningen, the …
Hi, I am using minfi to analyze an Illumina 450K dataset, but due to an experimental design that comprises more than 2 grps I am a lost on how to properly go for 'bumphunting'... In addition, I do have 3 queries regarding minfi's workflow. See below. Any suggestions would be appreciated! Thanks, Guido Design of my study: parallel design, before + after measurement for each subject (baseline vs treatment, = paired), young and aged subjects. We included blood cells from 10 subjects in our study. Subjects were divided into 2 grps, "young" (5x) and "old" (5x). At baseline (before treatment) samles were taken (10x), and after 48hr of treatment (10x). In total thus 20 samples were analysed on the 450k array. Each treatment sample can be compared with it's respective baseline (paired analysis). Of interest are: 1) difference between young vs old at baseline (unpaired analysis, 10 vs 10), 2) effect of treatment in young (paired analysis, 5 vs 5), 3) effect of treatment in old (paired analysis, 5 vs 5), 4) difference in response due to age, thus CpGs that respond differently in 3 vs 2 ["delta of the delta's"]. To identify the individual regulated CpGs for each comparison, I used limma (on the M-values) using the multilevel model (thus taking into account the paired structure using the duplicateCorrelation() function). However, considering the design of my experiment, how to optimally analyse for enriched regions using the bumphunter() function? Since I have multiple groups + paired structure, the design matrix doesn't contain a column (coefficient) that corresponds to each of the contrasts of interest (1-4). This is also returned as warning by the bumphunter() function, and referred is to the vignette [which one I don't know, but I guess it's from the library bumphunter]? ** Thus how to setup the design matrix that all 4 contrasts are analysed with bumphunter? Code: > norm.swan MethylSet (storageMode: lockedEnvironment) assayData: 485512 features, 20 samples element names: Meth, Unmeth phenoData sampleNames: 7766130001_R01C01 7766130001_R02C01 ... 7766130037_R05C02 (20 total) varLabels: Sample_ID Sample_Plate ... filenames (15 total) varMetadata: labelDescription Annotation array: IlluminaHumanMethylation450k annotation: ilmn12.hg19 Preprocessing Method: SWAN (based on a MethylSet preprocesses as 'Raw (no normalization or bg correction)' minfi version: 1.10.1 Manifest version: 0.4.0 > > # convert MethylSet (norm.swan) into GenomicRatioSet through RatioSet > norm.swan2 <- ratioConvert(norm.swan, what = "both", keepCN = TRUE) > norm.swan2 <- mapToGenome(norm.swan2) Loading required package: IlluminaHumanMethylation450kanno.ilmn12.hg19 > > # experimental factors > treatment <- pd$TrtmntGrps > subject <- as.factor(pd$SubjectID) > > design <- model.matrix(~0+ treatment + subject) > > design treatmentoldctrl treatmentoldWY treatmentyoungctrl treatmentyoungWY subject31 subject34 subject35 1 1 0 0 0 0 0 0 2 0 0 0 1 0 0 0 3 0 1 0 0 0 0 0 4 1 0 0 0 0 0 0 5 1 0 0 0 0 0 0 6 0 0 0 1 0 0 1 7 0 1 0 0 0 0 0 8 0 0 1 0 0 0 0 9 0 0 0 1 0 0 0 10 1 0 0 0 0 0 0 11 0 1 0 0 0 0 0 12 1 0 0 0 0 0 0 13 0 0 1 0 0 0 0 14 0 0 0 1 0 1 0 15 0 0 1 0 0 1 0 16 0 1 0 0 0 0 0 17 0 1 0 0 0 0 0 18 0 0 1 0 0 0 1 19 0 0 1 0 1 0 0 20 0 0 0 1 1 0 0 subject43 subject52 subject62 subject64 subject65 subject66 1 0 1 0 0 0 0 2 1 0 0 0 0 0 3 <<snip>> > #sample (nonsense) bumphunter run to check everything worked > bumps <- bumphunter(norm.swan2, design = design, coef=1, B = 1, type = "Beta", cutoff = 0.10) [bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.2). [bumphunterEngine] Computing coefficients. [bumphunterEngine] Performing 1 permutations. [bumphunterEngine] Computing marginal permutation p-values. [bumphunterEngine] cutoff: 0.1 [bumphunterEngine] Finding regions. [bumphunterEngine] Found 205534 bumps. [bumphunterEngine] Computing regions for each permutation. Loading required package: rngtools Loading required package: pkgmaker Loading required package: registry Attaching package: 'pkgmaker' The following object is masked from 'package:IRanges': new2 [bumphunterEngine] Estimating p-values and FWER. Warning message: In bumphunterEngine(getMethSignal(object, type), design = design, : The use of the permutation test (B>0), is not recommended with multiple covariates, (ncol(design)>2). See vignette for more information. > I also have three small queries on the workflow: ** I normalized my data using SWAN. To use these data as input for bumphunter(), I had to do a 2-step conversion [ratioConvert() followed by mapToGenome()]. Is this the way to do this, or is there another (direct) way? > norm.swan2 <- ratioConvert(norm.swan, what = "both", keepCN = TRUE) > norm.swan2 <- mapToGenome(norm.swan2) ** To confirm, in the bumphunter() function, when 'type=beta' a 'cutoff=0.10' means that the absolute difference between the beta values of the 2 grps should at least be 0.10? Thus at least a difference of 10% in methylation status of one of the CpGs in the region (bump)? ** What is the current way of annotating the 450k data? Making use of "FDb.InfiniumMethylation.hg19" from Tim, "IlluminaHumanMethylation450kanno.ilmn12.hg19" from Kasper, or an allternative method? > sessionInfo() R Under development (unstable) (2013-11-19 r64265) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] doRNG_1.6 rngtools_1.2.4 [3] pkgmaker_0.22 registry_0.2 [5] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1 IlluminaHumanMethylation450kmanifest_0.4.0 [7] minfi_1.10.1 bumphunter_1.4.1 [9] locfit_1.5-9.1 iterators_1.0.7 [11] foreach_1.4.2 Biostrings_2.32.0 [13] XVector_0.4.0 GenomicRanges_1.16.3 [15] GenomeInfoDb_1.0.2 IRanges_1.22.6 [17] lattice_0.20-29 Biobase_2.24.0 [19] BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] annotate_1.42.0 AnnotationDbi_1.26.0 base64_1.1 beanplot_1.1 codetools_0.2-8 compiler_3.1.0 [7] DBI_0.2-7 digest_0.6.4 genefilter_1.46.1 grid_3.1.0 illuminaio_0.6.0 limma_3.20.2 [13] MASS_7.3-33 matrixStats_0.8.14 mclust_4.3 multtest_2.20.0 nlme_3.1-117 nor1mix_1.1-4 [19] plyr_1.8.1 preprocessCore_1.26.1 R.methodsS3_1.6.1 RColorBrewer_1.0-5 Rcpp_0.11.1 reshape_0.8.5 [25] RSQLite_0.11.4 siggenes_1.38.0 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 [31] tools_3.1.0 XML_3.98-1.1 xtable_1.7-3 zlibbioc_1.10.0 > --------------------------------------------------------- Guido Hooiveld, PhD Nutrition, Metabolism & Genomics Group Division of Human Nutrition Wageningen University Biotechnion, Bomenweg 2 NL-6703 HD Wageningen the Netherlands tel: (+)31 317 485788 fax: (+)31 317 483342 email: guido.hooiveld@wur.nl internet: http://nutrigene.4t.com http://scholar.google.com/citations?user=qFHaMnoAAAAJ http://www.researcherid.com/rid/F-4912-2010 [[alternative HTML version deleted]]
GO convert minfi bumphunter GO convert minfi bumphunter • 2.2k views
0
Entering edit mode
@andrewjskelton73-7074
Last seen 18 months ago
United Kingdom

I know this is a very old post, but for anyone that encounters the same problem, check out DMRcate to look for DMRs and incorporate your limma model designs, along with contrast matrices

0
Entering edit mode
david.watson ▴ 10
@davidwatson-9545
Last seen 2.2 years ago
Queen Mary University of London

I'm currently facing a similar problem, Guido. (In fact, I posted about it just yesterday: https://support.bioconductor.org/p/77148/). I don't have a full answer, but I have a few ideas. First of all, the warning message you're getting from bumphunter is due to the presence of any covariates, totally independent of study design. When incorporating even a single adjustment variable into your model, Jaffe et al. recommend using bootstrap sampling rather than permutations to calculate p-values and FWERs. Just set nullMethod='bootstrap' and you should be fine. There's some mention in the bumphunter vignette of Efron & Tibshirani's method being preferable in such cases, but it doesn't appear to be implemented as yet.

As for the more substantial issue of how to get bumphunter to handle your complex design, I'm not completely sure. One possibility might be to rebuild your model matrix so that 'Contrasts' is now a factor variable with levels corresponding to each of your research questions. Then just change the coef= argument within bumphunter to select the contrast of interest for that run. The problem there though is that you lose the duplicateCorrelation info.

I think I'm just going to try and revise the source code so that I can upload the output from a topTable call straight into bumphunter. If I figure it out, I'll be sure to let you know.