It's hard to answer your question without an idea of your experimental design or of how you've set up your design matrix. In particular, the interpretation of the coefficients requires some care as these values may not necessarily represent log-fold changes of interest. Consider the following example:
groups <- LETTERS[1:5] # five groups, one sample each.
design <- model.matrix(~0 + groups)
y <- matrix(rnorm(1000), ncol=length(groups)) # log-expression values.
fit <- lmFit(y, design)
head(fit$coefficients)
Here, the coefficients represent the average log-expression of each group - which, in your case, is the same as the log-expression of the corresponding sample, given that you only have one sample in each group. None of the coefficient values represent log-fold changes between groups. Instead, if you want log-fold changes, you need to do some more work:
con <- makeContrasts(groupsA - groupsB,
groupsC - groupsD,
# ... and however many comparisons you want ...
levels=design)
fit2 <- contrasts.fit(fit, con)
head(fit2$coefficients)
The story is different if you've made a design matrix without the ~ 0
above, in which case the first coefficient is the intercept (i.e., the log-value of the "first" group) and the other coefficients represent the log-fold change of each other group to the first group. Thus the reference group is the first group, here A
. (More generally, the first level of the factor given to model.matrix
.)
In all cases, it is critical that you properly understand the meaning of each column of the design matrix before trying to interpret the values of the coefficients. Misinterpretation of the coefficients is a major source of errors in DE analyses.
As for your other question; you can't use a volcano plot here because you can't compute p-values with only one replicate. I would suggest using an MA plot for each pairwise comparison and highlighting the genes of interest. Alternatively you could use a heatmap if you want to show samples from all groups at once (assuming you have more than two groups).
Hi Aaron,
Thank you so much, this explains a lot, appreciate the help.
Just to clarify, when we make a contrast matrix and in the final output for
fit2$coefficients
, we get +2 for groupsA-groupsB for Gene A, this would mean that Gene A was down regulated by (lg2) times in groupsB with respect to groupsA right?And if we had instead done
design=(~groups)
instead, then I can expect to see, in the final output, for Gene A, under column groupsB, a value of -2? Which again, would mean that Gene A was down regulated by (lg2) times in groupsB with respect to groupsA which was coeff=1?Thanks!
A value of 2 for the coefficient of the contrast
groupsA - groupsB
represents a 4-fold change (log-fold change of 2) in group A over group B. Similarly, a value of -2 for thegroupsB
coefficient in a design matrix with an intercept represents a downregulation in group B compared to the intercept, i.e., group A.If you get confused about the direction, you can always check the original expression values to get your bearings.
Hi Aaron,
Thank you, that makes a lot of sense!