Question

Confusion about LFC using contrasts weights in EdgeR

0

Entering edit mode

pl23 • 0

@4b83ad99

Last seen 7 months ago

Canada

Hello,

I am trying to use contrasts with a cell means model in EdgeR to compute log fold changes. I am a little confused by the results I get by modifying weights of the contrasts. For instance, I modified the following piece of code from one the EdgeR examples.

#------------ Defining the data and design
> ngenes <- 100
> n1 <- 3
> n2 <- 3
> n3 <-3
> nlibs <- n1+n2+n3
> mu <- 100
> phi <- 0.1
> group <- c(rep(1,n1), rep(2,n2), rep(3,n3))
> design <- model.matrix(~0+as.factor(group))

> ### 4-fold change for the first 5 genes
> i <- 1:5
> fc <- 4
> mu <- matrix(mu, ngenes, nlibs)
> mu[i, 1:n1] <- mu[i, 1:n1]*fc


> counts <- matrix(rnbinom(ngenes*nlibs, mu=mu, size=1/phi), ngenes, nlibs)
> d <- DGEList(counts=counts,lib.size=rep(1e6, nlibs), group=group)

> gfit <- glmFit(d, design, dispersion=phi)

#-------------------------------------- Contrasts --------------

> tr <- glmTreat(gfit, contrast=c(1,1,-1), lfc=1)
> tr2 <- glmTreat(gfit, contrast=c(.5,.5,-1), lfc=1)

> topTags(tr)
Coefficient:  1*as.factor(group)1 1*as.factor(group)2 -1*as.factor(group)3 
       logFC unshrunk.logFC   logCPM       PValue          FDR
7  -14.49698      -14.50068 6.428122 1.029359e-92 5.793101e-91
32 -14.47646      -14.47995 6.482914 1.158620e-92 5.793101e-91
74 -14.26280      -14.26602 6.621232 6.749246e-91 2.249749e-89
72 -14.11476      -14.11748 6.832884 6.579315e-90 1.537966e-88
79 -14.12208      -14.12489 6.711183 7.689829e-90 1.537966e-88
86 -14.07918      -14.08192 6.735456 1.716826e-89 2.861376e-88
11 -14.09690      -14.09979 6.547746 2.035350e-89 2.907644e-88
13 -13.87883      -13.88147 6.573221 1.673322e-87 1.921928e-86
76 -13.84859      -13.85116 6.785719 1.858638e-87 1.921928e-86
50 -13.91007      -13.91282 6.333895 1.921928e-87 1.921928e-86

> topTags(tr2)
Coefficient:  0.5*as.factor(group)1 0.5*as.factor(group)2 -1*as.factor(group)3 
        logFC unshrunk.logFC   logCPM     PValue      FDR
3   1.2569501      1.2577449 7.994976 0.08410010 0.999649
5   1.2546815      1.2557728 7.541736 0.08618485 0.999649
4   1.1923287      1.1931614 7.749677 0.11551910 0.999649
1   1.1707078      1.1716912 7.535702 0.12723752 0.999649
2   0.9689215      0.9695723 7.570693 0.27704382 0.999649
60  0.7809965      0.7821674 6.599907 0.46447814 0.999649
47  0.7787079      0.7797861 6.698892 0.46783455 0.999649
32 -0.7165637     -0.7175457 6.482914 0.53317722 0.999649
72 -0.7129839     -0.7137471 6.832884 0.54020766 0.999649
68  0.6988273      0.6998642 6.588292 0.55360906 0.999649

> counts[72,]
[1] 133  60  94 104  76  87 134 165 155
> gfit$coefficients[72,]
as.factor(group)1 as.factor(group)2 as.factor(group)3 
        -9.253335         -9.325471         -8.795200

What confuses me here is that I expected the log fold change for the first contrast to be less negative than the second, since the weights for the first two factors are reduced in the second. However, since the coefficients are negative, it appears that the results are the opposite. In particular, for the first contrast, I expected the log fold change to be positive or closer to 0, given the counts, say for row 72 (realizing the contrasts do not directly use counts and incorporates more information such as library sizes and dispersions), I intuitively assumed (factor 1+ factor 2) would be more upregulated than factor 3.

Am I making a misinterpreting the outputs? How do I interpret the results?

Thank you!

edgeR • 739 views

ADD COMMENT • link 10 months ago pl23 • 0

score 2 · Accepted Answer · 2024-02-16

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

For the group-means design matrix that you have defined, the contrasts must consist of weights that add to 0, otherwise they don't correspond to comparisons between groups.

In your code, tr2 is a correct contrast that compares group3 to the average of group1 and group2.

Your first contrast however doesn't add to zero and doesn't correspond to any comparison between the groups. It is actually returning a type of intercept rather than a logFC.

ADD COMMENT • link 10 months ago Gordon Smyth 52k

0

Entering edit mode

Thank you very much for your prompt response Gordon Smyth! I suspected I was making a mistake with that. If you dont mind me asking - what value am I actually getting when I use the wrong contrast?

ADD REPLY • link 10 months ago pl23 • 0

3

Entering edit mode

edgeR works internally with glm coefficients on the log-nucleotide-fraction scale. You have mu=100 and lib.size=1e6 so the glm coefficients are approximately equal to log(100 / 1e6) = -9.2

The contrast c(a,b,c) simply gives logFC equal to a*coef1+b*coef2+c*coef3 but converted from loge to log2. If the gene is not DE, then the three coefficients will be roughy equal. If a=1, b=1 and c=-1 then for non-DE genes you roughly get coef1, which on the log2 scale is log2(100 / 1e6) = -13.3.

If the contrast vector adds to zero then the internal scale cancels out. You have mu equal to 400,100 and 100 for the DE genes. If you set contrast c(1,-1,0) or c(1,0,-1) or c(1,-0.5,-0.5) then you will get logFC equal to log2(400) - log2(100) = log2(400/100) = 2. If you set contrast c(0.5,0.5,-1) then you will get logFC equal to 0.5*log2(400) + 0.5*log2(100) - log2(100) = 1.

ADD REPLY • link 10 months ago Gordon Smyth 52k

0

Entering edit mode

Ah that makes sense - so basically because I had a large scale difference using contrasts that summed to zero, I was essentially just getting the log2 coefficient for group 1 (say) which is negative due to the normalization by the library size.

Thanks very much for your detailed answer! It is very helpful to know how the computations work.

ADD REPLY • link 10 months ago pl23 • 0