Assess fit of variance prior
2
1
Entering edit mode
@frederik-ziebell-14676
Last seen 10 weeks ago
Heidelberg, Germany

Since limma is also applied to data types other than microarrays, e.g. MS-based proteomics data, is it advisable to asses the fit of the variance prior before running limma? I found that even for data that follows the limma model, it is difficult to judge the fit of the prior based on a QQ-plot. In the example below, the fit does not seem to be very good although the estimates of df.prior and s2.prior are very close to the true value.

library("limma")

plot_prior_fit <- function(fit){

q_chisq <- function(p) {qchisq(p, df = fit$df.prior)} y <- fit$df.prior*fit$s2.prior/fit$sigma^2

qqplot(
x = q_chisq(ppoints(y)),
y = y,
xlab = "Theoretical Quantiles", ylab = "Sample Quantiles"
)
qqline(y, distribution = q_chisq, col = "red")
}

n_genes <- 5000
n_samples_per_group <- 4

set.seed(1)
sigma2 <- 0.05 / rchisq(n_genes, df = 10) * 10
y <- matrix(rnorm(n_genes*2*n_samples_per_group, sd = sqrt(sigma2)), nrow = n_genes, ncol = 2*n_samples_per_group)
design <- cbind(Intercept = 1, Group = c(rep(0, n_samples_per_group), rep(1, n_samples_per_group)))
# 1% of genes differentially expressed
y[seq_len(floor(n_genes/100)), seq_len(n_samples_per_group)] <- y[seq_len(floor(n_genes/100)), seq_len(n_samples_per_group)] + 1
fit <- lmFit(y,design)
fit <- eBayes(fit)

plot_prior_fit(fit)


limma • 372 views
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

is it advisable to asses the fit of the variance prior before running limma?

No, that isn't necessary. Bayesian analyses are extremely robust to the shape of the prior distribution. In any case, assessing the fit of the prior distribution directly is impossible because it relates to quantities that are not observed.

If this sort of checking was necessary or useful, then it would explained in the limma User's Guide or workflows.

In the example below, the fit does not seem to be very good

You have not compared like with like. limma assumes the observed variances to follow a scaled F-distribution, as explained in the published papers.