Question

DEG with Limma without replicates and best plots to represent results

0

Entering edit mode

Nithisha ▴ 10

@nithisha-14272

Last seen 6.9 years ago

Hi all,

I have a few questions regarding Limma.

1) I am asking a question with reference to C: limma topTable doesn't work without replicates [was: Help with limma]. This is a case where there are no replicates for samples, and we cannot run eBayes or toptable. I understand that running fit2$coefficients will bring out logFC values for all the samples. However, when I run this, I get a lgFC value for all samples including my control. In such a case, what is the reference sample that we can compare the lgFC values against?

2) I would like to visually represent my top hits for say 20 most up-regulated genes and 20 most down-regulated genes after DEG analysis. Are the best visual representations to show the analysis and the genes a volcano plot? Or are there better plots to show this information?

Any advice would be appreciated.

Thanks!

Limma • 1.5k views

ADD COMMENT • link updated 7.2 years ago by Aaron Lun ★ 28k • written 7.2 years ago by Nithisha ▴ 10

score 2 · Accepted Answer · 2017-11-04

It's hard to answer your question without an idea of your experimental design or of how you've set up your design matrix. In particular, the interpretation of the coefficients requires some care as these values may not necessarily represent log-fold changes of interest. Consider the following example:

groups <- LETTERS[1:5] # five groups, one sample each.
design <- model.matrix(~0 + groups)
y <- matrix(rnorm(1000), ncol=length(groups)) # log-expression values.
fit <- lmFit(y, design)
head(fit$coefficients)

Here, the coefficients represent the average log-expression of each group - which, in your case, is the same as the log-expression of the corresponding sample, given that you only have one sample in each group. None of the coefficient values represent log-fold changes between groups. Instead, if you want log-fold changes, you need to do some more work:

con <- makeContrasts(groupsA - groupsB, 
                     groupsC - groupsD, 
                     # ... and however many comparisons you want ...
                     levels=design)
fit2 <- contrasts.fit(fit, con)
head(fit2$coefficients)

The story is different if you've made a design matrix without the ~ 0 above, in which case the first coefficient is the intercept (i.e., the log-value of the "first" group) and the other coefficients represent the log-fold change of each other group to the first group. Thus the reference group is the first group, here A. (More generally, the first level of the factor given to model.matrix.)

In all cases, it is critical that you properly understand the meaning of each column of the design matrix before trying to interpret the values of the coefficients. Misinterpretation of the coefficients is a major source of errors in DE analyses.

As for your other question; you can't use a volcano plot here because you can't compute p-values with only one replicate. I would suggest using an MA plot for each pairwise comparison and highlighting the genes of interest. Alternatively you could use a heatmap if you want to show samples from all groups at once (assuming you have more than two groups).