Hello,

I am trying to use contrasts with a cell means model in EdgeR to compute log fold changes. I am a little confused by the results I get by modifying weights of the contrasts. For instance, I modified the following piece of code from one the EdgeR examples.

```
#------------ Defining the data and design
> ngenes <- 100
> n1 <- 3
> n2 <- 3
> n3 <-3
> nlibs <- n1+n2+n3
> mu <- 100
> phi <- 0.1
> group <- c(rep(1,n1), rep(2,n2), rep(3,n3))
> design <- model.matrix(~0+as.factor(group))
> ### 4-fold change for the first 5 genes
> i <- 1:5
> fc <- 4
> mu <- matrix(mu, ngenes, nlibs)
> mu[i, 1:n1] <- mu[i, 1:n1]*fc
> counts <- matrix(rnbinom(ngenes*nlibs, mu=mu, size=1/phi), ngenes, nlibs)
> d <- DGEList(counts=counts,lib.size=rep(1e6, nlibs), group=group)
> gfit <- glmFit(d, design, dispersion=phi)
#-------------------------------------- Contrasts --------------
> tr <- glmTreat(gfit, contrast=c(1,1,-1), lfc=1)
> tr2 <- glmTreat(gfit, contrast=c(.5,.5,-1), lfc=1)
> topTags(tr)
Coefficient: 1*as.factor(group)1 1*as.factor(group)2 -1*as.factor(group)3
logFC unshrunk.logFC logCPM PValue FDR
7 -14.49698 -14.50068 6.428122 1.029359e-92 5.793101e-91
32 -14.47646 -14.47995 6.482914 1.158620e-92 5.793101e-91
74 -14.26280 -14.26602 6.621232 6.749246e-91 2.249749e-89
72 -14.11476 -14.11748 6.832884 6.579315e-90 1.537966e-88
79 -14.12208 -14.12489 6.711183 7.689829e-90 1.537966e-88
86 -14.07918 -14.08192 6.735456 1.716826e-89 2.861376e-88
11 -14.09690 -14.09979 6.547746 2.035350e-89 2.907644e-88
13 -13.87883 -13.88147 6.573221 1.673322e-87 1.921928e-86
76 -13.84859 -13.85116 6.785719 1.858638e-87 1.921928e-86
50 -13.91007 -13.91282 6.333895 1.921928e-87 1.921928e-86
> topTags(tr2)
Coefficient: 0.5*as.factor(group)1 0.5*as.factor(group)2 -1*as.factor(group)3
logFC unshrunk.logFC logCPM PValue FDR
3 1.2569501 1.2577449 7.994976 0.08410010 0.999649
5 1.2546815 1.2557728 7.541736 0.08618485 0.999649
4 1.1923287 1.1931614 7.749677 0.11551910 0.999649
1 1.1707078 1.1716912 7.535702 0.12723752 0.999649
2 0.9689215 0.9695723 7.570693 0.27704382 0.999649
60 0.7809965 0.7821674 6.599907 0.46447814 0.999649
47 0.7787079 0.7797861 6.698892 0.46783455 0.999649
32 -0.7165637 -0.7175457 6.482914 0.53317722 0.999649
72 -0.7129839 -0.7137471 6.832884 0.54020766 0.999649
68 0.6988273 0.6998642 6.588292 0.55360906 0.999649
> counts[72,]
[1] 133 60 94 104 76 87 134 165 155
> gfit$coefficients[72,]
as.factor(group)1 as.factor(group)2 as.factor(group)3
-9.253335 -9.325471 -8.795200
```

What confuses me here is that I expected the log fold change for the first contrast to be less negative than the second, since the weights for the first two factors are reduced in the second. However, since the coefficients are negative, it appears that the results are the opposite. In particular, for the first contrast, I expected the log fold change to be positive or closer to 0, given the counts, say for row 72 (realizing the contrasts do not directly use counts and incorporates more information such as library sizes and dispersions), I intuitively assumed `(factor 1+ factor 2)`

would be more upregulated than `factor 3`

.

Am I making a misinterpreting the outputs? How do I interpret the results?

Thank you!

Thank you very much for your prompt response Gordon Smyth! I suspected I was making a mistake with that. If you dont mind me asking - what value am I actually getting when I use the wrong contrast?

edgeR works internally with glm coefficients on the log-nucleotide-fraction scale. You have

`mu=100`

and`lib.size=1e6`

so the glm coefficients are approximately equal to`log(100 / 1e6) = -9.2`

The contrast

`c(a,b,c)`

simply gives logFC equal to`a*coef1+b*coef2+c*coef3`

but converted from loge to log2. If the gene is not DE, then the three coefficients will be roughy equal. If`a=1`

,`b=1`

and`c=-1`

then for non-DE genes you roughly get`coef1`

, which on the log2 scale is`log2(100 / 1e6) = -13.3`

.If the contrast vector adds to zero then the internal scale cancels out. You have

`mu`

equal to 400,100 and 100 for the DE genes. If you set contrast`c(1,-1,0)`

or`c(1,0,-1)`

or`c(1,-0.5,-0.5)`

then you will get logFC equal to`log2(400) - log2(100) = log2(400/100) = 2`

. If you set contrast`c(0.5,0.5,-1)`

then you will get logFC equal to`0.5*log2(400) + 0.5*log2(100) - log2(100) = 1`

.Ah that makes sense - so basically because I had a large scale difference using contrasts that summed to zero, I was essentially just getting the log2 coefficient for group 1 (say) which is negative due to the normalization by the library size.

Thanks very much for your detailed answer! It is very helpful to know how the computations work.