It is pretty easy to home-brew your own power analysis, especially given that you already have some "pilot" data.

The key to making it realistic is to pull out the empirical Bayes statistics from your current dataset.

```
df0 <- median(fit$df.prior) # median, just in case you used robust=TRUE
s2 <- fit$s2.prior # might be a vector if trend=TRUE
ameans <- fit$Amean
```

Now, let's say you want to see the effect of doubling the number of samples in each group. For simplicity, I'll only deal with two groups. We generate a design matrix where the second coefficient represents the effect of interest.

```
grouping <- gl(2, 20)
nlibs <- length(grouping)
new.design <- model.matrix(~grouping)
```

We now do some stuff to generate a pretend dataset with similar parameters to the pilot:

```
ngenes <- length(ameans)
# Getting true variances for each gene
sigma <- sqrt(s2 / rchisq(ngenes, df=df0))
# Generating the residual effects.
resid.effects <- matrix(rnorm(ngenes*nlibs, sd=sigma), nrow=ngenes)
resid.effects[,seq_len(ncol(new.design))] <- 0
# Generating the residual matrix.
resids <- t(qr.qty(qr(new.design), t(resid.effects)))
# Adding back the abundances.
new.y <- resids + ameans
# Adding a specified effect for a random 10% of genes.
# In this case, a log-fold change of 1.5.
chosen <- sample(ngenes, round(ngenes/10))
effect <- sample(c(-1.5, 1.5), length(chosen), replace=TRUE)
new.y[chosen,] <- new.y[chosen,] + outer(effect, new.design[,2], "*")
```

Then we can analyze `new.y`

with *limma* again, and see how many of the DE genes are detected:

```
new.fit <- lmFit(new.y, new.design)
new.fit <- eBayes(new.fit)
table(decideTests(new.fit)[chosen,2])
```

Rinse and repeat until you get an experimental design with your desired detection rate. You can also tune the generation of the new dataset if you think your genes of interest are low or high-abundance (in which case, you just `sample`

from the relevant interval).

I should add that the above simulation is not entirely correct, as the addition of the effect will change the average abundance of the simulated genes in `new.y`

. I'm not entirely sure how to avoid that for an arbitrary design matrix; I guess one could define something like:

```
new.design2 <- sweep(new.design, 2, colMeans(new.design), "-")
new.design2[,"(Intercept)"] <- 1
```

... and use that instead, which ensures that only the intercept needs to be specified to determine the average abundance.

couldtry to estimate the proportion of true nulls using Storey's method on the p-value distribution from your pilot data; but if your pilot didn't have enough power, the distribution of p-values under the alternative would probably be close to uniform, resulting in overestimation of the number of true nulls. Similar issues apply to estimation of the log-fold changes. After all, if you were able to estimate these parameters precisely from the pilot data, then you should have been able to identify some DE genes in the first place.For the log-fold change, I would just go for 1, as this is the minimum I would consider for further experimental work. For the number of genes - this depends on how similar your biological groups are. Perhaps there are only 100 DEGs; maybe there are thousands. If you knew that already, you wouldn't need to do any experiments.

Unless you're answering your own question, don't use "Add answer". Use the "ADD COMMENT" button instead.