Question

Is anyone able to explain the statistics behind the dmpFinder function in minfi package?

0

Entering edit mode

mcall ▴ 20

@mcall-11507

Last seen 6.0 years ago

Hi Everyone!

I am using dmpFinder to find dmp on a methylation experiment.

As the authors said, dmpFinder is a wrapper around limma, so I tried to emmulate the steps (that I thought) were followed by the dmpFinder function... After examining the function @HansenLab's Github I found a series of steps that I don't fully understand...

Is anyone able to explain the statistics behind the dmpFinder function in minfi package? Why does dmpFinder applies bayes in some cases, or else, uses Sigma?? (Please, follow the link to Hansen's Lab to see the code I am talking about...) Could someone, please, guide me on why these steps are followed? or what do they mean?

I am sorry if the question is too basic, maybe it is a matter of the limma package, but I can't seem to find the answer.

Many thanks in advance.

https://github.com/kasperdanielhansen/minfi/blob/master/R/dmr.R

minfi methylation dmpfinder limma • 4.1k views

ADD COMMENT • link updated 8.7 years ago by vulegom • 0 • written 9.4 years ago by mcall ▴ 20

score 0 · Answer 1 · 2016-09-19

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 18 hours ago

United States

Rather than looking at the code, why don't you look at the help page? It says

Arguments:

     dat: A 'MethylSet' or a 'matrix'.

   pheno: The phenotype to be tested for association with methylation.

    type: Is the phenotype ''continuous' or 'categorical'?

 qCutoff: DMPs with an FDR q-value greater than this will not be
          returned.

shrinkVar: Should variance shrinkage be used? See details.

Details:

     This function tests each genomic position for association between
     methylation and a phenotype. Continuous phenotypes are tested with
     linear regression, while an F-test is used for categorical
     phenotypes.

     Variance shrinkage ('shrinkVar=TRUE') is recommended when sample
     sizes are small (<10). The sample variances are squeezed by
     computing empirical Bayes posterior means using the 'limma'
     package.

Which seems to be fairly explanatory, unless you don't know what 'squeezing variance' means, in which case you should read the limma User's Guide, or just rest easy knowing that you should do this if you have very few samples, and maybe not if you have more samples.

ADD COMMENT • link 9.4 years ago James W. MacDonald 68k

0

Entering edit mode

Hi James.

Many thanks for you reply.

Maybe I should give you more background.

I am trying to select an N amount of probes that are differentially methylated based on a linear model (for further analysis that talking about it doesn't fit in this question)

Since I have a constraint to choose N amount and not just all the data coming from the 450k experiments, my first shot was to preselect them based on the dmpFinder outcomes. But later, I learned that the function is just a wrapper around limma. So I tried with limma to see if got the same dmp.

With “minfi” package, to get those dmp, I do:

dmp1 <- dmpFinder (betas, IC50Drug1 , type = "continuous")

as you can see I don’t use ('shrinkVar=TRUE'), because I know my sample size is not too small…

However, I don’t get the same result if I apply the following with limma:

betas <- rbind (betas, IC50Drug1)

design <- model.matrix(~IC50Drug1, data = betas)

fit <- lmFit (betas, design)

ebfit <- eBayes(fit)

fitted_limma <- topTable(ebfit, number=1000, p.value = 0.05, adjust.method="fdr",

coef=1, sort.by = "p")

I can see I get a different result because, since I didn’t choose to ('shrinkVar=TRUE'), then the code that is running is the following:

else if (type == "continuous") {

design <- model.matrix(~pheno)

fit <- lmFit(M, design)

if (shrinkVar) {

fit <- eBayes(fit)

sigma <- sqrt(fit$s2.post)

df <- fit$df.prior + fit$df.residual

} else {

sigma <- fit$sigma

df <- fit$df.residual

}

coef <- fit$coefficients

if (is.vector(coef))

coef <- matrix(coef, ncol=2)

stdev <- fit$stdev.unscaled

if (is.vector(stdev))

stdev <- matrix(stdev, ncol=2)

t <- coef[, 2]/(stdev[, 2] * sigma)

pval <- 2 * pt(abs(t), df=df, lower.tail=F)

tab <- data.frame(intercept=coef[, 1], beta=coef[, 2], t=t, pval=pval)

}

p0 <- pi0.est(tab$pval[!is.na(tab$pval)])$p0

tab$qval <- qvalue.cal(tab$pval, p0)

if (qCutoff < 1)

tab <- tab[tab$qval <= qCutoff,]

o <- order(tab$pval)

tab <- tab[o, ]

So, I’d like to know what lies beneath (statistically speaking) this piece of code. Because my understanding is that limma has a straight forward way of being used. Unless, it has to be used differently depending on …?(?)based on your answer, depending on the number of samples, I suspect???

If so, then I’d like to know the statistics that drive those steps, so that I can apply the same logical steps without just copy-pasting what has been done for the dmpFinder.

Many thanks,

ADD REPLY • link 9.4 years ago mcall ▴ 20

score 0 · Answer 2 · 2016-09-20

0

Entering edit mode

mcall ▴ 20

@mcall-11507

Last seen 6.0 years ago

Hi James.

Many thanks for you reply.

I added my answer as a comment.

ADD COMMENT • link 9.4 years ago mcall ▴ 20

0

Entering edit mode

If you want to add a comment to an existing post, please use the ADD COMMENT button and type in the box that appears. The box below (Add your answer), is for, well, adding answers.

I'm not sure what more I can tell you other than what I have already said. If you want to understand what limma's eBayes step is doing, then you will have to do some reading. And the place to start is the limma User's Guide, which gives a relatively gentle introduction to the idea of variance shrinkage and why one would want to do something like that. If you want to know even more, then there are quite a few citations at the beginning of the User's Guide that you could read, starting with the first one (limma powers differential expression analyses for RNA-sequencing and microarray studies), which is still pretty tame.

If you want to get all hardcore about things, you could then read "Linear models and empirical Bayes methods for assessing differential expression in microarray experiments.", which is a book chapter from Statistical Applications in Genetics and Molecular Biology.

ADD REPLY • link 9.4 years ago James W. MacDonald 68k

score 0 · Answer 3 · 2017-05-16

0

Entering edit mode

vulegom • 0

@vulegom-13057

Last seen 8.7 years ago

dmp finder is very function and the uses of it are in many fold.

happy birthday quotes

ADD COMMENT • link 8.7 years ago vulegom • 0

score 0 · Answer 4 · 2017-05-16

0

Entering edit mode

vulegom • 0

@vulegom-13057

Last seen 8.7 years ago

dmp finder is used in many wasy and i relly love using it sso i recommend all the users to use the dmp function

happy birthday quotes

ADD COMMENT • link 8.7 years ago vulegom • 0