Question

explanation of makeContrasts function of limma package

1

Entering edit mode

elisa.micarelli ▴ 10

@elisamicarelli-8523

Last seen 5.9 years ago

European Union

Hi,

even at this stage, I want to thank anyone will can help me.

I'm using the Limma package in order to anlayse the gene expression in different tissues(Sample). For each tissue I want to know which genes are overexpressed and underexpressed; for example I would be able to say that the gene_x is more expressed in tissue A than B (and C) but less expressed than D. In others words the purpose of my analysis is to obtain for each gene a ranking list of expression level in tissues, that I could sort it in decreasing way in according to expression level or I could show the results for each genes in a heatmap; however the visualization is not still important in this phase.

I didn't understand the scientific meaning behind the makeContrasts function and the difference among following example:

1) makeContrasts(B-A,C-B,C-A,levels=c("A","B","C"))

2) makeContrasts(contrasts="A-(B+C)/2",levels=c("A","B","C"))

3) x <- c("B-A","C-B","C-A")

makeContrasts(contrasts=x, levels=c("A","B","C"))

In particular for my analysis, What type of makeContrasts example I must to apply?

I'm thinking to do an iterative procedure that compares one tissue at a time whit others all;

For example:

the first comparison is:

makeContrasts(contrasts="A-(B+C)/2",levels=c("A","B","C"))

the second is:

makeContrasts(contrasts="B-(A+C)/2",levels=c("A","B","C"))

the third is:

makeContrasts(contrasts="C-(A+B)/2",levels=c("A","B","C"))

and so on, if there are over three tissues. It's right? If this approach is wrong, How should I proceed?

Furthermore, What is the scientific difference among following example?

makeContrasts("A-(B+C)/2",contrast= A, levels=c("A","B","C"))

and

makeContrasts(A-B,A-C,contrast=A, levels=c("A","B","C"))

Thanks a lot!!

Best

limma microarray R geneexpression • 5.9k views

ADD COMMENT • link updated 9.0 years ago by Aaron Lun ★ 28k • written 9.0 years ago by elisa.micarelli ▴ 10

score 0 · Answer 1 · 2015-08-01

For your first question, approaches 1 and 3 are identical. Each column of the contrast matrix will test for differences between the corresponding groups - for example, the first contrast will test for DE between groups B and A. You can specify this comparison by setting coef=1 in topTable (and similarly, for all other columns). If you leave coef unspecified, then the entire contrast matrix will be used in an ANOVA-like comparison. This tests for any differences between any of the groups A, B or C. In contrast, approach 2 will test for differences between A and the average of B and C. So, for example, a DE gene that is up in B (relative to A) and down in C would be detected by approach 1, as differences exist between groups; but it might not be detected by approach 2, as the average of B (up) and C (down) might be similar to A.

For your second question, the simplest approach seems to be doing an ANOVA-like comparison involving all groups. This will identify DE genes where differences are present between any of the groups. You can then identify the ranking across tissues for each gene, based on the size of the group-wise coefficients in the MArrayLM object that you originally get from lmFit. In a one-way layout, these coefficients represent the average log-expression in each group, so you can compare their sizes between groups to identify tissues in which they are up/down-regulated. An alternative would be to do comparisons between each pair of groups, and to intersect those genes that are significantly DE in each pairwise comparison. This will stringently identify genes that are DE across all pairs of tissues, but will be a lot more conservative, e.g., a gene that is non-DE between one pair of tissues will not be detected, even if it is DE across all other tissues. With regards to your proposal; personally, I wouldn't do comparisons of each group against the average of the others, as this tends to be difficult to interpret. For example, DE against the average of the others does not mean DE against each other group.

For the last question, your contrasts with contrast=A don't make any sense. In what circumstance would you drop a coefficient for a tissue? The null hypothesis would be that log-expression is equal to zero in that tissue, which is undoubtedly false for most genes.