Search
Question: Multiple comparison of coefficients in limma
0
9 months ago by
Seymoo0
Oslo
Seymoo0 wrote:

In a cohort of cancer samples there are 4 groups, G1 to G4. I want to perform differential gene expression between group1 Vs all other groups, i,e, G1 = G1-(G2+G3+G4)/3, to determine DEGs for group 1.

I could just find an example in limma guide where pair-wise comparisons being made ( under 9.3. several groups) with following example

 f <- factor(targets$Target, levels=c("RNA1","RNA2","RNA3")) design <- model.matrix(~0+f) colnames(design) <- c("RNA1","RNA2","RNA3") To make all pair-wise comparisons between the three groups one could proceed fit <- lmFit(eset, design) contrast.matrix <- makeContrasts(RNA2-RNA1, RNA3-RNA2, RNA3-RNA1, levels=design) fit2 <- contrasts.fit(fit, contrast.matrix) fit2 <- eBayes(fit2) I am wondering how should I formulate that in contrast matrix to be statistically correct? would this be correct to write it like this: design1 <- model.matrix(~0+factor(annot$grp, levels = unique(annot$grp))) colnames(design1) <- unique(annot$grp)
row.names(design1) <- row.names(annot)
wt <- arrayWeights(HTA)
fitHTA <- lmFit(HTAEXP, design1, weights = wt)
genas(fitHTA, subset="Fpval", plot=TRUE, alpha=0.4)
cm1 <- makeContrasts(G1 = G1-(G2+G3+G4)/3,
G2 = G2-(G1+G3+G4)/3,
G3 = G3-(G1+G2+G4)/3,
G4 = G4-(G1+G2+G3)/3, levels=design1)

modified 9 months ago by James W. MacDonald47k • written 9 months ago by Seymoo0
1
9 months ago by
United States
James W. MacDonald47k wrote:

That looks OK, if your intention is to compare each group against the mean of all the remaining groups.

Hi James,

So because now I am comparing them against the mean, to do similar comparison would it be better to formulate it like this

cm1 <- makeContrasts(G1VsG2 = G1-G2,
G1VsG3 = G1-G3,
G1VsG4 = G1-G4,
G2VsG3 = G2-G3,
G2VsG4 = G2-G4,
G3VsG4 = G3-G4,
levels=design1)

fitHTA2 <- contrasts.fit(fitHTA, cm1)
fitHTA2 <- eBayes(fitHTA2, robust=TRUE, trend=TRUE)

topTable(fitHTA2, coef=c("G1VsG2","G1VsG3","G1VsG4"), adjust.method="BH", p.value=0.01, sort.by="F",number = 20)

because the result is different, and I am not sure what I am getting here is similar to the comparison I did before.

1

In your original post, you were comparing each group to the average of the others. For example, the null hypothesis for contrast G1 is that the average expression in group G1 is equal to the grand mean of the average expression in groups G2, G3 and G4. By comparison, the topTable call in your above comment does an ANOVA-like contrast where your null hypothesis is that all groups have the same average expression. (This is true even though you only define comparisons involving G1; for example, if G2 and G3 vary, then at least one of them must differ from G1.)

The two contrasts are quite different in their null hypotheses and their scientific usefulness. The contrasts in your original post provide an average direction and magnitude of DE for each group compared to the others; this is a convenient summary but may be misleading when the average log-fold change is not representative of the differences between specific pairs of groups.

The contrast in your comment only determines whether there is any difference between groups, it makes no statement on the significance of the direction or magnitude of the log-fold change to the other groups. Nonetheless, this can be useful if we just want some genes with differences, e.g., for plotting or some other meta-analyses. The ANOVA is more powerful than combining pairwise comparisons, and it isn't confused by one of the "other groups" going up and another one going down (such that the average of G2/3/4 would be equal to G1, which would escape detection with the other contrast).