Hi,
I am comparing gene expression measured in two different tissues (tissue1, tissue2) in two different ages (age1, age2).
I am trying to obtain three lists:
1. Genes differentially expressed between tissue1 and tissue2.
2. Genes differentially expressed between age1 and age2.
3. Genes for which the expression ratio between tissue1 and tissue2 is changing significantly with age progression (from age1 to age2).
I tried two different design matrices.
Option 1 - Defining each combination as a group, and defining contrasts
samples$Tissue <- factor(samples$Tissue, levels = c("tissue1", "tissue2"), ordered = F) samples$Age <- factor(samples$Age, levels = c("age1", "age2"), ordered = F) samples$Group <- factor(interaction(samples$Age, samples$Tissue)) design <- model.matrix(~0+Group, data = samples) colnames(design) <- levels(samples$Group) d <- DGEList(counts = counts) d <- estimateGLMCommonDisp(d, design) d <- estimateGLMTagwiseDisp(d, design) fit <- glmFit(d, design) contrasts <- makeContrasts((age2.tissue2 + age1.tissue2) - (age2.tissue1 + age1.tissue1), (age2.tissue2 + age2.tissue1) - (age1.tissue2 + age1.tissue1), (age2.tissue2 - age2.tissue1) - (age1.tissue2 - age1.tissue1), levels = colnames(design)) colnames(contrasts) <- c("tissue", "age", "tissue.age")
Option 2 - Defining an interaction model
design <- model.matrix(~Tissue + Age + Tissue:Age, data = samples) d <- DGEList(counts = counts) d <- estimateGLMCommonDisp(d, design) d <- estimateGLMTagwiseDisp(d, design) fit <- glmFit(d, design)
Then, I use glmLRT to test the contrasts (option 1) or coefficients (option 2).
When I use option 1, I get that the numbers of differentially expressed genes testing the contrasts "tissue", "age", "tissue.age" are (q-value < 0.05) are: 6541, 10196, 1956.
Using option 2, testing the coefficients 2,3,4, the numbers of differentially expressed genes are: 4490, 7635, 1956.
That is, I get higher numbers of genes differentially expressed between the different ages and between the different tissues when using option 1. However, in practice, I think that the results of option 1 are better, in the sense that I get better enrichments down the road.
What is the different assumptions that the two option of analysis make? Can you recommend me which one to follow?