18 months ago by

Cambridge, United Kingdom

It is pretty easy to home-brew your own power analysis, especially given that you already have some "pilot" data.

The key to making it realistic is to pull out the empirical Bayes statistics from your current dataset.

```
df0 <- median(fit$df.prior) # median, just in case you used robust=TRUE
s2 <- fit$s2.prior # might be a vector if trend=TRUE
ameans <- fit$Amean
```

Now, let's say you want to see the effect of doubling the number of samples in each group. For simplicity, I'll only deal with two groups. We generate a design matrix where the second coefficient represents the effect of interest.

```
grouping <- gl(2, 20)
nlibs <- length(grouping)
new.design <- model.matrix(~grouping)
```

We now do some stuff to generate a pretend dataset with similar parameters to the pilot:

```
ngenes <- length(ameans)
# Getting true variances for each gene
sigma <- sqrt(s2 / rchisq(ngenes, df=df0))
# Generating the residual effects.
resid.effects <- matrix(rnorm(ngenes*nlibs, sd=sigma), nrow=ngenes)
resid.effects[,seq_len(ncol(new.design))] <- 0
# Generating the residual matrix.
resids <- t(qr.qty(qr(new.design), t(resid.effects)))
# Adding back the abundances.
new.y <- resids + ameans
# Adding a specified effect for a random 10% of genes.
# In this case, a log-fold change of 1.5.
chosen <- sample(ngenes, round(ngenes/10))
effect <- sample(c(-1.5, 1.5), length(chosen), replace=TRUE)
new.y[chosen,] <- new.y[chosen,] + outer(effect, new.design[,2], "*")
```

Then we can analyze `new.y`

with *limma* again, and see how many of the DE genes are detected:

```
new.fit <- lmFit(new.y, new.design)
new.fit <- eBayes(new.fit)
table(decideTests(new.fit)[chosen,2])
```

Rinse and repeat until you get an experimental design with your desired detection rate. You can also tune the generation of the new dataset if you think your genes of interest are low or high-abundance (in which case, you just `sample`

from the relevant interval).

I should add that the above simulation is not entirely correct, as the addition of the effect will change the average abundance of the simulated genes in `new.y`

. I'm not entirely sure how to avoid that for an arbitrary design matrix; I guess one could define something like:

```
new.design2 <- sweep(new.design, 2, colMeans(new.design), "-")
new.design2[,"(Intercept)"] <- 1
```

... and use that instead, which ensures that only the intercept needs to be specified to determine the average abundance.