Question

edgeR: exactTest, glmFIt or glmQLFit approach?

0

Entering edit mode

alessia.donato • 0

@alessiadonato-11112

Last seen 7.8 years ago

Hi,

I'm using edgeR to analyze a RNAseq data set which has many biological replicates and several variables. I have tried to use both the exact test pipeline and the glm function.

Which of the three approaches (exactTest, glmFit, glmQLFit) is more appropriate for my data set?
I noticed that by updating R to the 3.3.1 version and edgeR to 3.14 version the results of the glmQLFit are different from the ones obtained with R 3.2.2 and edgeR 3.12 version (see below). Is there a difference in the glmQLFit between the two versions?

My code is as follows:

y <- calcNormFactors(y)
design <- model.matrix(~0+Group)
colnames(design) <- levels(Group)
y <- estimateDisp(y, design)
qlfit <- glmQLFit(y, design)
my.contrasts <- makeContrasts(
  D1.DvsD0.D= D1.D-D0.D,
  D2.DvsD1.D= D2.D-D1.D,levels=design)
lrt <- glmQLFTest(qlfit, contrast=my.contrasts[,"D1.DvsD0.D"])
d1<- topTags(lrt, n=Inf)

Version 3.12 result:
summary (dup<- decideTestsDGE(lrt)) 
   [,1]
-1 3566
0  6725
1  2714

Version 3.14 result:
summary (dup<- decideTestsDGE(lrt)) 
   [,1]
-1 4323
0  5535
1  3147

Any advice is much appreciated!

edger rnaseq glm glmfit • 5.3k views

ADD COMMENT • link updated 7.8 years ago by Aaron Lun ★ 28k • written 7.8 years ago by alessia.donato • 0

score 3 · Accepted Answer · 2016-07-14

It seems like you have a one-way layout, so you could use any of the three approaches. However, I would suggest that you use glmQLFit + glmQLFTest as they provide the most accurate type I error control. This is because they account for the uncertainty of the dispersion estimates (which can be considerable when you have few replicates and/or the amount of shrinkage is low), whereas the other methods do not.

As for the differences between versions, this is probably due to an error in the older version. Briefly, there is some code in glmQLFTest to put a lower bound on the p-value, under the assumption that genes should not exhibit below-Poisson variance. In the older version, this bound was not calculated correctly as the dispersion for the GLM in this part of the code was set to the trended dispersion rather than zero. This has been patched in the newer version, which should (correctly) give a smaller bound, greater power and more DE genes.