Difference between Fold Changes from EdgeR glmLRTest and directly calculated fold changes.
1
0
Entering edit mode
abf ▴ 30
@abf-14661
Last seen 8 weeks ago
United States

In the process of validating my procedure for using edgeR to calculate differential expression statistics; I've found that there are differences between the fold changes estimated by glmLRTest, and those calculated directly from the matrix of scaled cpm values returned by "cpm(y)". I am aware the the "prior.count" parameter can be be used to shrink log fold changes towards zero. I've tested this myself and examined the impact of different settings; ranging from 0 to 10. However with my data, even when prior.count is set to 0; it looks like there is still some shrinkage being applied to the fold change. My design matrix (bottom of post) does include a batch adjustment factor, and I suspect that this may play a role.

z<-z<-y[filterByExpr(y, design=design), , keep.lib.sizes=FALSE]
z<-estimateCommonDisp(z)
z<-estimateTagwiseDIsp(z)
z<-calcNormFactors(z)


Model is fit as follows:

  fit<-glmFit(z, design, prior.count=0)
qlf<-glmLRT(fit, coef=2:5)
deg<-as.data.frame(topTags(qlf, n=Inf))


The estimated fold changes disagree with those calculated from the cpm matrix

Estmated from the CPM matrix:

> gr1<-c(
+   "1_S1","2_S2","4_S3",
+   "WT_0_hr_1_S1","WT_0_hr_2_S2","WT_0_hr_3_S3",
+   "W_0_hr_1_S256","W_0_hr_2_S257","W_0_hr_3_S258"
+ )
>
> gr2<-c(
+   "WT_6_hr_1_S4","WT_6_hr_2_S5","WT_6_hr_3_S15"
+ )
> aveLogCPM(z[genes, gr1])
[1] 6.211942 5.699160 3.259007 2.478364
> aveLogCPM(z[genes, gr2])
[1] 10.916058 10.616974  7.811504 10.667996

> aveLogCPM(z[genes, gr2]) - aveLogCPM(z[genes, gr1])
[1] 4.704117 4.917814 4.552497 8.189632


Reported by the glmLRT function:

> deg[genes,"logFC.Interval6H"]
[1] 4.553255 4.738738 4.285688 8.025348


For Reference, my design matrix is below:

> design
(Intercept) Interval6H Interval24H Interval48H Interval120H LabDNA
1_S1                    1          0           0           0            0      0
2_S2                    1          0           0           0            0      0
4_S3                    1          0           0           0            0      0
6_S4                    1          0           1           0            0      0
7_S5                    1          0           1           0            0      0
8_S6                    1          0           1           0            0      0
9_S7                    1          0           0           1            0      0
10_S8                   1          0           0           1            0      0
11_S9                   1          0           0           1            0      0
WT_0_hr_1_S1            1          0           0           0            0      1
WT_0_hr_2_S2            1          0           0           0            0      1
WT_0_hr_3_S3            1          0           0           0            0      1
WT_6_hr_1_S4            1          1           0           0            0      1
WT_6_hr_2_S5            1          1           0           0            0      1
WT_6_hr_3_S15           1          1           0           0            0      1
WT_24_hr_1_S6           1          0           1           0            0      1
WT_24_hr_2_S7           1          0           1           0            0      1
WT_24_hr_3_S8           1          0           1           0            0      1
W_0_hr_1_S256           1          0           0           0            0      1
W_0_hr_2_S257           1          0           0           0            0      1
W_0_hr_3_S258           1          0           0           0            0      1
W_5_D_1_S259            1          0           0           0            1      1
7_S261                  1          0           0           0            1      1
W_5_D_3_S260            1          0           0           0            1      1
attr(,"assign")
[1] 0 1 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$Interval [1] "contr.treatment"  edger RNA Seq • 375 views ADD COMMENT 0 Entering edit mode By dropping the batch column from my design matrix, AND using the "prior.count" parameter with aveLogCPM, the fold change estimates that I calculate become much more similar.  (Intercept) Interval6H Interval24H Interval48H Interval120H 1_S1 1 0 0 0 0 2_S2 1 0 0 0 0 4_S3 1 0 0 0 0 6_S4 1 0 1 0 0 7_S5 1 0 1 0 0 8_S6 1 0 1 0 0 9_S7 1 0 0 1 0 10_S8 1 0 0 1 0 11_S9 1 0 0 1 0 WT_0_hr_1_S1 1 0 0 0 0 WT_0_hr_2_S2 1 0 0 0 0 WT_0_hr_3_S3 1 0 0 0 0 WT_6_hr_1_S4 1 1 0 0 0 WT_6_hr_2_S5 1 1 0 0 0 WT_6_hr_3_S15 1 1 0 0 0 WT_24_hr_1_S6 1 0 1 0 0 WT_24_hr_2_S7 1 0 1 0 0 WT_24_hr_3_S8 1 0 1 0 0 W_0_hr_1_S256 1 0 0 0 0 W_0_hr_2_S257 1 0 0 0 0 W_0_hr_3_S258 1 0 0 0 0 W_5_D_1_S259 1 0 0 0 1 7_S261 1 0 0 0 1 W_5_D_3_S260 1 0 0 0 1 attr(,"assign") [1] 0 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$Interval
[1] "contr.treatment"

z<-estimateCommonDisp(z)
z<-estimateTagwiseDisp(z)
z<-calcNormFactors(z)

A<-aveLogCPM(z[genes,gr2], normalized.lib.sizes = T, prior.count = 0)
B<-aveLogCPM(z[genes,gr1], normalized.lib.sizes = T, prior.count = 0)

> A - B
[1] 4.706790 4.921665 4.573017 8.226730

fit<-glmFit(z, design, prior.count=0)
qlf<-glmLRT(fit, coef=2:5)
deg<-as.data.frame(topTags(qlf, n=Inf))

> deg[genes,"logFC.Interval6H"]
[1] 4.707106 4.922300 4.574113 8.233953

0
Entering edit mode

deleted due to duplicate

0
Entering edit mode

Do you have question?

glmFit does not shrink the logFCs when prior.count=0. The function does what it is documented to do.

Obviously the glmFit negative binomial generalized linear model fit is much more sophisticated than simple averaging of logCPMs, but you haven't actually tried averaging logCPM values from cpm, so you haven't actually made that comparison.

1
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

The coefficients for a GLM don't have a closed form solution, so you can't calculate them by hand using the logCPM values. Since you can't compute the coefficients, it stands to reason that you can't compute the fold changes either.