Question: Bumphunting for longitudinal data
gravatar for Ashlynn
17 months ago by
Ashlynn0 wrote:

Hello everyone,

I'm working with some DNA methylation data and would like to use either Jaffe et al.'s bumphunter algorithm as implemented in the minfi package or the DMRcate package to identify differentially methylated regions. My study is a bit more complex as I have repeated measures (two time points) and therefore am running into some trouble specifying a design matrix. I would like to use a linear mixed effect model (lmer) to account for the repeated measures. Does anyone have experience with this? Any guidance would be much appreciated!

I would like to specify my design matrix to look something like this...

design<-lmer(cpg_mval~case_status+age+sex+cell_composition+smoking+meds+(1|id), data=data)



ADD COMMENTlink modified 17 months ago by Tim Peters80 • written 17 months ago by Ashlynn0
Answer: Bumphunting for longitudinal data
gravatar for James W. MacDonald
17 months ago by
United States
James W. MacDonald51k wrote:

Do note that lmer isn't a function to generate a design matrix - it's a function intended to fit a linear mixed model that happens to create a design matrix as part of the model fit.

If you have just two time points, then there isn't really a compelling reason to use a repeated measures design. Instead you can just block on patient, removing all the then redundant covariates (which, depending on the timing between observations may include age, smoking, and meds, and will certainly include sex). It then depends on what the research question might be.

If you are doing something like a case/control with treatment and you want to know if the meds cause differential methylation depending on the case/control status, then you have to do some trickeration in order to get a full rank design matrix. See section 3.5 in the edgeR User's Guide, which shows how to do that sort of thing.

ADD COMMENTlink written 17 months ago by James W. MacDonald51k

Dear James,

Thank you very much for the clear explanation. I am wondering if you could help me resolve my confusion with a similar matter. I am doing a longitudinal study with repeated measures at three time points. We are studying methylation changes during the course of malaria infection and after treatment in the host. We have three repeated measures at three time points for each individual (before infection T1, after infection T2, and after treatment T3). My initial plan was to treat individual as a "random effect" in a mixed-model, after coming across your post along with reading section 3.5.2 on "Blocking" at the edgeR User's I decided to block on the "individual" in my model. However, after reading a bit in the literature I got more confused and I am wondering if its actually the right thing to do given that I have three repeated measures.

So my question, Can I still block "individual" in my model given that we have 3 repeated measures?

I have provided the script that I used for you just in case you needed to have a look on what I did:

Age <- pData(New.gset.funnorm.addsnp)$Age Individual <- pData(New.gset.funnorm.addsnp)$Individual InfectionStatus <- pData(New.gset.funnorm.addsnp)$InfectionStatus Sex <- pData(New.gset.funnorm.addsnp)$Sex


designMatrix.snp1 <- model.matrix(~Infection_Status+Age+Sex)

corfit1 <- duplicateCorrelation(myMs, design=designMatrix.snp1, block = biolrep) dmrCate.snp1<-cpg.annotate("array", myMs, what="Beta", arraytype = "EPIC", analysis.type="differential", design=designMatrix.snp1, coef=2, block=biolrep, correlation=corfit1$consensus.correlation,fdr=0.05) dmrcoutputall.snp1 <- dmrcate(dmrCate.snp1,lambda=500, C=5, pcutoff="fdr",min.cpgs=3,mc.cores=1) wgbs.rangesall.snp1 <- extractRanges(dmrcoutputall.snp1, genome = "hg19") write.csv (wgbs.rangesall.snp1,file="DMRCateModel1DifferentailRange.txt")

Thanks a lot for your time


ADD REPLYlink written 6 months ago by dareenalmojil0
Answer: Bumphunting for longitudinal data
gravatar for Tim Peters
17 months ago by
Tim Peters80
Tim Peters80 wrote:

Hi Ash,

Yes, like James said, you'll need to sacrifice some of your covariates to get your design matrix full rank. Once you have a full rank matrix, you can then pass it to cpg.annotate() along with your coefficient of interest (and optionally contrast matrix), and then the regular workflow will do the rest with limma under the hood. 




ADD COMMENTlink written 17 months ago by Tim Peters80

Hi Tim,

Thank you for your guidance. I am trying to identify DMRs for cases compared to controls, using age, sex, cell composition and smoking as covariates in my design matrix. As this was a study design where we collected samples from all cases and controls at two time points I would like to account for the repeated measures for each sample. I have identified ids as biological replicates (each case/control has one unique id used at both time points) and have run corfit. My question is whether I have correctly used block and corfit$consenus correlation in cpg.annotate() or if they should have been included in my model.matrix(). One further question I have is whether DMRcate can be used for continuous independent variables of interest and if so how would one interpret the results/plot the DMRs? I appreciate any further help you can offer! Kind regards, Ash

design<-model.matrix(~case_status + Age + Sex + smoking + ccomp)
corfit <- duplicateCorrelation(myMs.noSNPs, design=design, block = biolrep)
my_annotation<-cpg.annotate("array", myMs.noSNPs, what="M", arraytype = "EPIC", analysis.type="differential", design=design, coef=2, block=biolrep, correlation=corfit$consensus.correlation) 
ADD REPLYlink modified 17 months ago • written 17 months ago by Ashlynn0

Hi Ash,

Yes, it looks like that will work. The ellipsis argument (...) at the end of cpg.annotate() is matched to extra arguments for limma::lmFit(), so block and correlation will propagate to the linear fitting.

And yes, DMRcate can definitely be used for continuous responses as well. The only caveats I would add are that a) they are not exactly "DMRs" anymore but regions that significantly correlate with the response variable, b) The "betafc" field in the output should be interpreted as the mean regression slope across the region, instead of the mean difference c) You 'll need to make phen.col length 1 (since response is not categorical)  and perhaps order the columns of CpGs to that of your response vector in order to make DMR.plot() output look nice (i.e. see the gradation of methylation values vertically within the "DMR"). 



ADD REPLYlink written 17 months ago by Tim Peters80

Hi Tim,


Thanks a bunch for your help with this! I appreciate your guidance.





ADD REPLYlink written 17 months ago by Ashlynn0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 273 users visited in the last hour