In the process of validating my procedure for using edgeR to calculate differential expression statistics; I've found that there are differences between the fold changes estimated by glmLRTest, and those calculated directly from the matrix of scaled cpm values returned by "cpm(y)". I am aware the the "prior.count" parameter can be be used to shrink log fold changes towards zero. I've tested this myself and examined the impact of different settings; ranging from 0 to 10. However with my data, even when prior.count is set to 0; it looks like there is still some shrinkage being applied to the fold change. My design matrix (bottom of post) does include a batch adjustment factor, and I suspect that this may play a role.
z<-z<-y[filterByExpr(y, design=design), , keep.lib.sizes=FALSE]
z<-estimateCommonDisp(z)
z<-estimateTagwiseDIsp(z)
z<-calcNormFactors(z)
Model is fit as follows:
fit<-glmFit(z, design, prior.count=0)
qlf<-glmLRT(fit, coef=2:5)
deg<-as.data.frame(topTags(qlf, n=Inf))
The estimated fold changes disagree with those calculated from the cpm matrix
Estmated from the CPM matrix:
> gr1<-c(
+ "1_S1","2_S2","4_S3",
+ "WT_0_hr_1_S1","WT_0_hr_2_S2","WT_0_hr_3_S3",
+ "W_0_hr_1_S256","W_0_hr_2_S257","W_0_hr_3_S258"
+ )
>
> gr2<-c(
+ "WT_6_hr_1_S4","WT_6_hr_2_S5","WT_6_hr_3_S15"
+ )
> aveLogCPM(z[genes, gr1])
[1] 6.211942 5.699160 3.259007 2.478364
> aveLogCPM(z[genes, gr2])
[1] 10.916058 10.616974 7.811504 10.667996
> aveLogCPM(z[genes, gr2]) - aveLogCPM(z[genes, gr1])
[1] 4.704117 4.917814 4.552497 8.189632
Reported by the glmLRT function:
> deg[genes,"logFC.Interval6H"]
[1] 4.553255 4.738738 4.285688 8.025348
For Reference, my design matrix is below:
> design
(Intercept) Interval6H Interval24H Interval48H Interval120H LabDNA
1_S1 1 0 0 0 0 0
2_S2 1 0 0 0 0 0
4_S3 1 0 0 0 0 0
6_S4 1 0 1 0 0 0
7_S5 1 0 1 0 0 0
8_S6 1 0 1 0 0 0
9_S7 1 0 0 1 0 0
10_S8 1 0 0 1 0 0
11_S9 1 0 0 1 0 0
WT_0_hr_1_S1 1 0 0 0 0 1
WT_0_hr_2_S2 1 0 0 0 0 1
WT_0_hr_3_S3 1 0 0 0 0 1
WT_6_hr_1_S4 1 1 0 0 0 1
WT_6_hr_2_S5 1 1 0 0 0 1
WT_6_hr_3_S15 1 1 0 0 0 1
WT_24_hr_1_S6 1 0 1 0 0 1
WT_24_hr_2_S7 1 0 1 0 0 1
WT_24_hr_3_S8 1 0 1 0 0 1
W_0_hr_1_S256 1 0 0 0 0 1
W_0_hr_2_S257 1 0 0 0 0 1
W_0_hr_3_S258 1 0 0 0 0 1
W_5_D_1_S259 1 0 0 0 1 1
7_S261 1 0 0 0 1 1
W_5_D_3_S260 1 0 0 0 1 1
attr(,"assign")
[1] 0 1 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$Interval
[1] "contr.treatment"
By dropping the batch column from my design matrix, AND using the "prior.count" parameter with aveLogCPM, the fold change estimates that I calculate become much more similar.
deleted due to duplicate
Do you have question?
glmFit
does not shrink the logFCs whenprior.count=0
. The function does what it is documented to do.Obviously the
glmFit
negative binomial generalized linear model fit is much more sophisticated than simple averaging of logCPMs, but you haven't actually tried averaging logCPM values fromcpm
, so you haven't actually made that comparison.