Question: Is anyone able to explain the statistics behind the dmpFinder function in minfi package?
0
gravatar for mcall
2.8 years ago by
mcall0
mcall0 wrote:

Hi Everyone!

I am using dmpFinder to find dmp on a methylation experiment.

As the authors said, dmpFinder is a wrapper around limma, so I tried to emmulate the steps (that I thought) were followed by the dmpFinder function... After examining the function @HansenLab's Github I found a series of steps that I don't fully understand...

Is anyone able to explain the statistics behind the dmpFinder function in minfi package? Why does dmpFinder applies bayes in some cases, or else, uses Sigma?? (Please, follow the link to Hansen's Lab to see the code I am talking about...) Could someone, please, guide me on why these steps are followed? or what do they mean?

I am sorry if the question is too basic, maybe it is a matter of the limma package, but I can't seem to find the answer.

Many thanks in advance.


https://github.com/kasperdanielhansen/minfi/blob/master/R/dmr.R

limma minfi methylation dmpfinder • 1.0k views
ADD COMMENTlink modified 2.2 years ago by vulegom0 • written 2.8 years ago by mcall0
Answer: Is anyone able to explain the statistics behind the dmpFinder function in minfi
0
gravatar for James W. MacDonald
2.8 years ago by
United States
James W. MacDonald50k wrote:

Rather than looking at the code, why don't you look at the help page? It says

Arguments:

     dat: A 'MethylSet' or a 'matrix'.

   pheno: The phenotype to be tested for association with methylation.

    type: Is the phenotype ''continuous' or 'categorical'?

 qCutoff: DMPs with an FDR q-value greater than this will not be
          returned.

shrinkVar: Should variance shrinkage be used? See details.

Details:

     This function tests each genomic position for association between
     methylation and a phenotype. Continuous phenotypes are tested with
     linear regression, while an F-test is used for categorical
     phenotypes.

     Variance shrinkage ('shrinkVar=TRUE') is recommended when sample
     sizes are small (<10). The sample variances are squeezed by
     computing empirical Bayes posterior means using the 'limma'
     package.

Which seems to be fairly explanatory, unless you don't know what 'squeezing variance' means, in which case you should read the limma User's Guide, or just rest easy knowing that you should do this if you have very few samples, and maybe not if you have more samples.

ADD COMMENTlink written 2.8 years ago by James W. MacDonald50k

Hi James.

Many thanks for you reply.

Maybe I should give you more background.

I am trying to select an N amount of probes that are differentially methylated based on a linear model (for further analysis that talking about it doesn't fit in this question)

Since I have a constraint to choose N amount and not just all the data coming from the 450k experiments, my first shot was to preselect them based on the dmpFinder outcomes. But later, I learned that the function is just a wrapper around limma. So I tried with limma to see if got the same dmp.

 

With “minfi” package, to get those dmp, I do:

dmp1 <- dmpFinder (betas, IC50Drug1 , type = "continuous")

as you can see I don’t use ('shrinkVar=TRUE'), because I know my sample size is not too small…

 

However, I don’t get the same result if I apply the following with limma:

betas <- rbind (betas, IC50Drug1)

design <- model.matrix(~IC50Drug1, data = betas)

fit <- lmFit (betas, design)

ebfit <- eBayes(fit)

fitted_limma <- topTable(ebfit, number=1000, p.value = 0.05, adjust.method="fdr",

coef=1, sort.by = "p")

 

I can see I get a different result because, since I didn’t choose to ('shrinkVar=TRUE'), then the code that is running is the following:

else if (type == "continuous") {

design <- model.matrix(~pheno)

fit <- lmFit(M, design)

if (shrinkVar) {

fit <- eBayes(fit)

sigma <- sqrt(fit$s2.post)

df <- fit$df.prior + fit$df.residual

} else {

sigma <- fit$sigma

df <- fit$df.residual

}

coef <- fit$coefficients

if (is.vector(coef))

coef <- matrix(coef, ncol=2)

stdev <- fit$stdev.unscaled

if (is.vector(stdev))

stdev <- matrix(stdev, ncol=2)

t <- coef[, 2]/(stdev[, 2] * sigma)

pval <- 2 * pt(abs(t), df=df, lower.tail=F)

tab <- data.frame(intercept=coef[, 1], beta=coef[, 2], t=t, pval=pval)

}

p0 <- pi0.est(tab$pval[!is.na(tab$pval)])$p0

tab$qval <- qvalue.cal(tab$pval, p0)

if (qCutoff < 1)

tab <- tab[tab$qval <= qCutoff,]

o <- order(tab$pval)

tab <- tab[o, ]

 

So, I’d like to know what lies beneath (statistically speaking) this piece of code. Because my understanding is that limma has a straight forward way of being used. Unless, it has to be used differently depending on …?(?)based on your answer, depending on the number of samples, I suspect???

If so, then I’d like to know the statistics that drive those steps, so that I can apply the same logical steps without just copy-pasting what has been done for the dmpFinder.

 

Many thanks,

 

 

ADD REPLYlink written 2.8 years ago by mcall0
Answer: Is anyone able to explain the statistics behind the dmpFinder function in minfi
0
gravatar for mcall
2.8 years ago by
mcall0
mcall0 wrote:

Hi James.

Many thanks for you reply.

I added my answer as a comment.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by mcall0

If you want to add a comment to an existing post, please use the ADD COMMENT button and type in the box that appears. The box below (Add your answer), is for, well, adding answers.

I'm not sure what more I can tell you other than what I have already said. If you want to understand what limma's eBayes step is doing, then you will have to do some reading. And the place to start is the limma User's Guide, which gives a relatively gentle introduction to the idea of variance shrinkage and why one would want to do something like that. If you want to know even more, then there are quite a few citations at the beginning of the User's Guide that you could read, starting with the first one (limma powers differential expression analyses for RNA-sequencing and microarray studies), which is still pretty tame.

If you want to get all hardcore about things, you could then read "Linear models and empirical Bayes methods for assessing differential expression in microarray experiments.", which is a book chapter from Statistical Applications in Genetics and Molecular Biology.

ADD REPLYlink written 2.8 years ago by James W. MacDonald50k
Answer: Is anyone able to explain the statistics behind the dmpFinder function in minfi
0
gravatar for vulegom
2.2 years ago by
vulegom0
vulegom0 wrote:

dmp finder is very function and the uses of it are in many fold.

 

happy birthday quotes

ADD COMMENTlink written 2.2 years ago by vulegom0
Answer: Is anyone able to explain the statistics behind the dmpFinder function in minfi
0
gravatar for vulegom
2.2 years ago by
vulegom0
vulegom0 wrote:

dmp finder is used in many wasy and i relly love using it sso i recommend all the users to use the dmp function

happy birthday quotes

ADD COMMENTlink written 2.2 years ago by vulegom0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 527 users visited in the last hour