Question: p-values from robust linear model in LIMMA
1
gravatar for Arvid Sondén
4.9 years ago by
Arvid Sondén40 wrote:

Dear all,

I am currently working with expression data in LIMMA and have been asked to fit a robust linear model:

lm.fit.rob <- lmFit(object=y$E,design=m.matrix, method="robust")
lm.fit.rob.bayes <- eBayes(lm.fit.rob)
lm.fit.rob.bayes.tt <- topTable(lm.fit.rob.bayes,coef="Group")

It is easy to find that lmFit uses mrlm, and that mrlm uses rlm. However, rlm does not generate p-values, and from what I have read (e.g. http://r.789695.n4.nabble.com/p-values-td803236.html ) it is not a trivial thing to do. What confuses me is that topTable generates p-values, and I can't find in the documentation how and on which assumptions.

The robust models do greatly improve my p-values, but can they be trusted?

Best regards,

Arvid

limma lmfit • 2.2k views
ADD COMMENTlink modified 4.3 years ago by Gordon Smyth37k • written 4.9 years ago by Arvid Sondén40
Answer: p-values from robust linear model in LIMMA
2
gravatar for Gordon Smyth
4.9 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

Dear Arvid,

You are right that the theory of hypothesis testing for robust regression is problematic. What limma does is to simply take the scale estimates and robustifying weights from rlm() and input them into the usual limma pipeline. This is intuitively reasonable because it downweights values that have been identified as potential outliers, and I wanted to include rlm as an option because of some early microarray studies that used it:

  http://www.pnas.org/content/100/9/5491.short

However the approximation to a t-distribution is fairly rough and I can't give you good guidelines as to how reliable the p-values are. The idea is that you can still interpret the logFC, or use the B-statistics as a gene ranking, even if you don't trust the p-values.

An alternative is to go back to lmFit() with method="ls" but to use robust=TRUE (and perhaps trend=TRUE as well) at the eBayes() step instead. This approach does have a rigorous theoretical underpinning:

  http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf

In practice, my lab doesn't use method="robust" for our own analyses. When we want to robustify, we do it at the eBayes step instead.

Best wishes
Gordon

ADD COMMENTlink modified 4.3 years ago • written 4.9 years ago by Gordon Smyth37k

Dear Gordon,

Thank you for your answer!

My issue is that I want to use my p-values for filtering, but when using method="ls" (even if using robust=TRUE in eBayes) I get no significant values after adjusting for multiple testing at all. Using the robust method in lmFit however generates a distribution of p-values much closer to what I would expect. It also gives me enough significant genes to work with in the following pathway analysis.

I understand that the p-values I get are not to be trusted if interpreting them as actual p-values, but would you also say that they cannot be used for filtering?

Best regards,
Arvid

ADD REPLYlink modified 4.3 years ago by Gordon Smyth37k • written 4.9 years ago by Arvid Sondén40

When you say "My issue is that I want to use my p-values for filtering," do you mean gene ranking? Yes, you could use them for ranking.

Gordon

ADD REPLYlink modified 4.3 years ago • written 4.9 years ago by Gordon Smyth37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 263 users visited in the last hour