explanation of makeContrasts function of limma package
1
1
Entering edit mode
@elisamicarelli-8523
Last seen 4.0 years ago
European Union

Hi,

even at this stage, I want to thank anyone will can help me.

I'm using the Limma package in order to anlayse the gene expression in different tissues(Sample). For each tissue I want to know which genes are overexpressed and underexpressed; for example I would be able to say that the gene_x is more expressed in tissue A than B (and C) but less expressed than D. In others words the purpose of my analysis is to obtain for each gene a ranking list of expression level in tissues, that I could sort it in decreasing way in according to expression level or I could show the results for each genes in a heatmap; however the visualization is not still important in this phase.

I didn't understand the scientific meaning behind the makeContrasts function and the difference among following example:

1) makeContrasts(B-A,C-B,C-A,levels=c("A","B","C"))

2) makeContrasts(contrasts="A-(B+C)/2",levels=c("A","B","C"))

3) x <- c("B-A","C-B","C-A")

makeContrasts(contrasts=x, levels=c("A","B","C"))

In particular for my analysis, What type of makeContrasts example I must to apply?

I'm thinking to do an iterative procedure that compares one tissue at a time whit others all;

For example:

the first comparison is:

makeContrasts(contrasts="A-(B+C)/2",levels=c("A","B","C"))

the second is:

makeContrasts(contrasts="B-(A+C)/2",levels=c("A","B","C"))

the third is:

makeContrasts(contrasts="C-(A+B)/2",levels=c("A","B","C"))

and so on, if there are over three tissues. It's right? If this approach is wrong, How should I proceed?

Furthermore, What is the scientific difference among following example?

makeContrasts("A-(B+C)/2",contrast= A, levels=c("A","B","C"))

and

makeContrasts(A-B,A-C,contrast=A, levels=c("A","B","C"))

Thanks a lot!!

Best

limma microarray R geneexpression • 4.5k views
0
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 2 hours ago
The city by the bay

For your first question, approaches 1 and 3 are identical. Each column of the contrast matrix will test for differences between the corresponding groups - for example, the first contrast will test for DE between groups B and A. You can specify this comparison by setting coef=1 in topTable (and similarly, for all other columns). If you leave coef unspecified, then the entire contrast matrix will be used in an ANOVA-like comparison. This tests for any differences between any of the groups A, B or C. In contrast, approach 2 will test for differences between A and the average of B and C. So, for example, a DE gene that is up in B (relative to A) and down in C would be detected by approach 1, as differences exist between groups; but it might not be detected by approach 2, as the average of B (up) and C (down) might be similar to A.

For your second question, the simplest approach seems to be doing an ANOVA-like comparison involving all groups. This will identify DE genes where differences are present between any of the groups. You can then identify the ranking across tissues for each gene, based on the size of the group-wise coefficients in the MArrayLM object that you originally get from lmFit. In a one-way layout, these coefficients represent the average log-expression in each group, so you can compare their sizes between groups to identify tissues in which they are up/down-regulated. An alternative would be to do comparisons between each pair of groups, and to intersect those genes that are significantly DE in each pairwise comparison. This will stringently identify genes that are DE across all pairs of tissues, but will be a lot more conservative, e.g., a gene that is non-DE between one pair of tissues will not be detected, even if it is DE across all other tissues. With regards to your proposal; personally, I wouldn't do comparisons of each group against the average of the others, as this tends to be difficult to interpret. For example, DE against the average of the others does not mean DE against each other group.

For the last question, your contrasts with contrast=A don't make any sense. In what circumstance would you drop a coefficient for a tissue? The null hypothesis would be that log-expression is equal to zero in that tissue, which is undoubtedly false for most genes.

0
Entering edit mode

Thanks for your answer and advices. I take in account your suggestions and now I'm going to try to do an ANOVA-like comparisons, but before to ask you any possible questions I must read better the manual and reflect on comparisons that I should do.

Thanks a lot.