Question

Difference between Fold Changes from EdgeR glmLRTest and directly calculated fold changes.

0

Entering edit mode

abf ▴ 30

@abf-14661

Last seen 2.4 years ago

United States

In the process of validating my procedure for using edgeR to calculate differential expression statistics; I've found that there are differences between the fold changes estimated by glmLRTest, and those calculated directly from the matrix of scaled cpm values returned by "cpm(y)". I am aware the the "prior.count" parameter can be be used to shrink log fold changes towards zero. I've tested this myself and examined the impact of different settings; ranging from 0 to 10. However with my data, even when prior.count is set to 0; it looks like there is still some shrinkage being applied to the fold change. My design matrix (bottom of post) does include a batch adjustment factor, and I suspect that this may play a role.

z<-z<-y[filterByExpr(y, design=design), , keep.lib.sizes=FALSE]
z<-estimateCommonDisp(z)
z<-estimateTagwiseDIsp(z)
z<-calcNormFactors(z)

Model is fit as follows:

  fit<-glmFit(z, design, prior.count=0)
  qlf<-glmLRT(fit, coef=2:5)
  deg<-as.data.frame(topTags(qlf, n=Inf))

The estimated fold changes disagree with those calculated from the cpm matrix

Estmated from the CPM matrix:

> gr1<-c(
+   "1_S1","2_S2","4_S3",
+   "WT_0_hr_1_S1","WT_0_hr_2_S2","WT_0_hr_3_S3",
+   "W_0_hr_1_S256","W_0_hr_2_S257","W_0_hr_3_S258"
+ )
> 
> gr2<-c(
+   "WT_6_hr_1_S4","WT_6_hr_2_S5","WT_6_hr_3_S15"
+ )
> aveLogCPM(z[genes, gr1])
[1] 6.211942 5.699160 3.259007 2.478364
> aveLogCPM(z[genes, gr2])
[1] 10.916058 10.616974  7.811504 10.667996



> aveLogCPM(z[genes, gr2]) - aveLogCPM(z[genes, gr1])
[1] 4.704117 4.917814 4.552497 8.189632

Reported by the glmLRT function:

> deg[genes,"logFC.Interval6H"]
[1] 4.553255 4.738738 4.285688 8.025348

For Reference, my design matrix is below:

> design
              (Intercept) Interval6H Interval24H Interval48H Interval120H LabDNA
1_S1                    1          0           0           0            0      0
2_S2                    1          0           0           0            0      0
4_S3                    1          0           0           0            0      0
6_S4                    1          0           1           0            0      0
7_S5                    1          0           1           0            0      0
8_S6                    1          0           1           0            0      0
9_S7                    1          0           0           1            0      0
10_S8                   1          0           0           1            0      0
11_S9                   1          0           0           1            0      0
WT_0_hr_1_S1            1          0           0           0            0      1
WT_0_hr_2_S2            1          0           0           0            0      1
WT_0_hr_3_S3            1          0           0           0            0      1
WT_6_hr_1_S4            1          1           0           0            0      1
WT_6_hr_2_S5            1          1           0           0            0      1
WT_6_hr_3_S15           1          1           0           0            0      1
WT_24_hr_1_S6           1          0           1           0            0      1
WT_24_hr_2_S7           1          0           1           0            0      1
WT_24_hr_3_S8           1          0           1           0            0      1
W_0_hr_1_S256           1          0           0           0            0      1
W_0_hr_2_S257           1          0           0           0            0      1
W_0_hr_3_S258           1          0           0           0            0      1
W_5_D_1_S259            1          0           0           0            1      1
7_S261                  1          0           0           0            1      1
W_5_D_3_S260            1          0           0           0            1      1
attr(,"assign")
[1] 0 1 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$Interval
[1] "contr.treatment"

edger RNA Seq • 1.1k views

ADD COMMENT • link updated 5.1 years ago by James W. MacDonald 67k • written 5.1 years ago by abf ▴ 30

0

Entering edit mode

By dropping the batch column from my design matrix, AND using the "prior.count" parameter with aveLogCPM, the fold change estimates that I calculate become much more similar.

              (Intercept) Interval6H Interval24H Interval48H Interval120H
1_S1                    1          0           0           0            0
2_S2                    1          0           0           0            0
4_S3                    1          0           0           0            0
6_S4                    1          0           1           0            0
7_S5                    1          0           1           0            0
8_S6                    1          0           1           0            0
9_S7                    1          0           0           1            0
10_S8                   1          0           0           1            0
11_S9                   1          0           0           1            0
WT_0_hr_1_S1            1          0           0           0            0
WT_0_hr_2_S2            1          0           0           0            0
WT_0_hr_3_S3            1          0           0           0            0
WT_6_hr_1_S4            1          1           0           0            0
WT_6_hr_2_S5            1          1           0           0            0
WT_6_hr_3_S15           1          1           0           0            0
WT_24_hr_1_S6           1          0           1           0            0
WT_24_hr_2_S7           1          0           1           0            0
WT_24_hr_3_S8           1          0           1           0            0
W_0_hr_1_S256           1          0           0           0            0
W_0_hr_2_S257           1          0           0           0            0
W_0_hr_3_S258           1          0           0           0            0
W_5_D_1_S259            1          0           0           0            1
7_S261                  1          0           0           0            1
W_5_D_3_S260            1          0           0           0            1
attr(,"assign")
[1] 0 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$Interval
[1] "contr.treatment"

z<-estimateCommonDisp(z)
z<-estimateTagwiseDisp(z)
z<-calcNormFactors(z)

A<-aveLogCPM(z[genes,gr2], normalized.lib.sizes = T, prior.count = 0) 
B<-aveLogCPM(z[genes,gr1], normalized.lib.sizes = T, prior.count = 0)

> A - B
[1] 4.706790 4.921665 4.573017 8.226730



fit<-glmFit(z, design, prior.count=0)
qlf<-glmLRT(fit, coef=2:5)
deg<-as.data.frame(topTags(qlf, n=Inf))

> deg[genes,"logFC.Interval6H"]
[1] 4.707106 4.922300 4.574113 8.233953

ADD REPLY • link 5.1 years ago abf ▴ 30

0

Entering edit mode

deleted due to duplicate

ADD REPLY • link 5.1 years ago abf ▴ 30

0

Entering edit mode

Do you have question?

glmFit does not shrink the logFCs when prior.count=0. The function does what it is documented to do.

Obviously the glmFit negative binomial generalized linear model fit is much more sophisticated than simple averaging of logCPMs, but you haven't actually tried averaging logCPM values from cpm, so you haven't actually made that comparison.

ADD REPLY • link 5.1 years ago Gordon Smyth 52k

score 1 · Answer 1 · 2019-11-26

1

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 1 day ago

United States

The coefficients for a GLM don't have a closed form solution, so you can't calculate them by hand using the logCPM values. Since you can't compute the coefficients, it stands to reason that you can't compute the fold changes either.

ADD COMMENT • link 5.1 years ago James W. MacDonald 67k