Hello,
I have a question about intercept and LogFC in edgeR. I am exactly following the article. From reads to genes to pathways...the edgeR quasi-likelihood pipeline. I also read the article A guide to creating design matrices for gene expression experiments but I am still confused about how intercept is associated with LogFC values and other statistics.
## model without intercept
y0 <- DGEList(GenewiseCounts[,-1], group=group,
               genes=GenewiseCounts[,1,drop=FALSE]
             )
y0$genes$Symbol <- mapIds(org.Mm.eg.db, rownames(y0),
                             keytype="ENTREZID", column="SYMBOL")
y0 <- y0[!is.na(y$genes$Symbol), ]
keep <- rowSums(cpm(y0) > 0.5) >= 2
y0 <- y0[keep, , keep.lib.sizes=FALSE]
y0 <- calcNormFactors(y0)
design_0 <- model.matrix(~0+group)
colnames(design_0) <- levels(group)
design_0
y0 <- estimateDisp(y0, design_0, robust=TRUE)
fit0 <- glmQLFit(y0, design_0, robust=TRUE)
B.LvsP0 <- makeContrasts(B.lactating-B.pregnant, levels=design_0)
res0 <- glmQLFTest(fit0, contrast=B.LvsP0)
## model with intercept
y1 <- DGEList(GenewiseCounts[,-1], group=group,
               genes=GenewiseCounts[,1,drop=FALSE]
             )
options(digits = 3)
y1$genes$Symbol <- mapIds(org.Mm.eg.db, rownames(y1),
                             keytype="ENTREZID", column="SYMBOL")
y1 <- y1[!is.na(y1$genes$Symbol), ]
keep <- rowSums(cpm(y1) > 0.5) >= 2
y1 <- y1[keep, , keep.lib.sizes=FALSE]
y1 <- calcNormFactors(y1)
design1 <- model.matrix(~ + group)
colnames(design1) <- levels(group)
y1 <- estimateDisp(y1, design1, robust=TRUE)
fit1 <- glmQLFit(y1, design1, robust=TRUE)
B.LvsP1 <- makeContrasts(B.lactating-B.pregnant, levels=design1)
res1 <- glmQLFTest(fit1, contrast=B.LvsP1)
My design matrix without intercept
> design_0
   B.lactating B.pregnant B.virgin L.lactating L.pregnant L.virgin
1            0          0        1           0          0        0
2            0          0        1           0          0        0
3            0          1        0           0          0        0
4            0          1        0           0          0        0
5            1          0        0           0          0        0
6            1          0        0           0          0        0
7            0          0        0           0          0        1
8            0          0        0           0          0        1
9            0          0        0           0          1        0
10           0          0        0           0          1        0
11           0          0        0           1          0        0
12           0          0        0           1          0        0
My design matrix with intercept
> design1
   B.lactating B.pregnant B.virgin L.lactating L.pregnant L.virgin
1            1          0        1           0          0        0
2            1          0        1           0          0        0
3            1          1        0           0          0        0
4            1          1        0           0          0        0
5            1          0        0           0          0        0
6            1          0        0           0          0        0
7            1          0        0           0          0        1
8            1          0        0           0          0        1
9            1          0        0           0          1        0
10           1          0        0           0          1        0
11           1          0        0           1          0        0
12           1          0        0           1          0        0
When I print out top DE genes using topTags, they are different.
## model without intercept
> topTags(res0)
Coefficient:  1*B.lactating -1*B.pregnant 
       Length  Symbol logFC logCPM   F   PValue      FDR
12992     765 Csn1s2b  6.09  10.18 421 4.86e-11 7.68e-07
211577   2006  Mrgprf  5.15   2.74 345 1.30e-10 8.05e-07
226101   7094    Myof  2.32   6.44 322 1.97e-10 8.05e-07
381290   8292  Atp2b4  2.14   6.14 320 2.04e-10 8.05e-07
140474  11281    Muc4 -7.17   6.05 308 2.64e-10 8.34e-07
231830   3346 Micall2 -2.25   5.18 282 4.47e-10 1.18e-06
24117    2242    Wif1 -1.82   6.76 260 7.30e-10 1.65e-06
12740    1812   Cldn4 -5.32   9.87 298 8.88e-10 1.71e-06
21953     667   Tnni2  5.75   3.86 313 9.76e-10 1.71e-06
231991   2873   Creb5  2.57   4.87 241 1.16e-09 1.83e-06
## model with intercept
> topTags(res1)
Coefficient:  1*B.lactating -1*B.pregnant 
       Length    Symbol logFC logCPM    F   PValue     FDR
328451   3224    Gm5088 -15.9   6.11 5179 5.47e-18 1.2e-15
107951   3389      Cdk9 -15.4   5.83 5048 6.43e-18 1.2e-15
433806    462 Chchd2-ps -14.6   5.87 4971 7.09e-18 1.2e-15
18570    1524     Pdcd6 -15.0   5.95 4900 7.77e-18 1.2e-15
67397    1243     Erp29 -14.4   6.26 4859 8.20e-18 1.2e-15
66046     999    Ndufb5 -15.1   6.14 4799 8.86e-18 1.2e-15
72047    4012     Ddx42 -14.3   6.19 4796 8.90e-18 1.2e-15
18141    4749     Nup50 -15.4   6.00 4790 8.97e-18 1.2e-15
66286     770    Sec11c -16.3   5.96 4730 9.72e-18 1.2e-15
21376    1565     Tbrg1 -14.2   6.08 4711 9.97e-18 1.2e-15
My understanding of the model with intercept is, in this case, the expression of B.lactating is the intercept term and each estimated coefficient is differences between the reference level (B.lactating) and each level (e.g. B.pregnant ... L.virgin). Whereas, the model without intercept is each estimated coefficient is the mean expression of each level.
I am not sure but I think LogFC seems to be log2(B.lactating / B.pregnant) in the model without intercept. But I can't even guess how how LogFC is calculated in the model with intercept. So could anyone explain me why the two models print out different top DE genes in terms of their statistics? (e.g. FDR, logFC...)
Sorry about my English. Have a great day!

The link you've entered for "A guide to creating design matrices for gene expression experiments" doesn't work. Could you please correct the URL.
Oops. I just corrected the URL. Thank you Dr. Smyth.