Different numbers of samples in each group do not pose a problem for limma
. Just set up your design matrix like you normally would, e.g.:
> condition <- c(rep("Healthy", 3), rep(c("A", "B", "C"), each=10))
> condition <- factor(condition, levels=c("Healthy", "A", "B", "C"))
> design <- model.matrix(~condition)
> colnames(design)
[1] "(Intercept)" "conditionA" "conditionB" "conditionC"
The intercept here represents the average expression of the healthy samples, while each of the ensuing coefficients represents the log-fold change of expression for the corresponding treatment over that of the healthy samples. You can then use this design matrix in lmFit
, as described in the user's guide. Contrasts between each treatment and the healthy samples can be performed by dropping the corresponding coefficient in topTable
, while comparisons between treatments should use makeContrasts
to compare the values of the relevant coefficients.
If you have additional predictor terms, you can put them in as factors or covariates in the model.matrix
call. However, this may require some care, depending on what these terms are, whether they are confounded by the treatment conditions, etc. You'll have to provide some more details about your situation in order for us to give useful advice.