edgeR: exactTest, glmFIt or glmQLFit approach?
1
0
Entering edit mode
@alessiadonato-11112
Last seen 7.8 years ago

Hi,

I'm using edgeR to analyze a RNAseq data set which has many biological replicates and several variables. I have tried to use both the exact test pipeline and the glm function.

  1. Which of the three approaches (exactTest, glmFit, glmQLFit)  is more appropriate for my data set? 
  2. I noticed that by updating R to the 3.3.1 version and edgeR to 3.14 version the results of the glmQLFit are different from the ones obtained with R 3.2.2 and edgeR 3.12 version (see below). Is there a difference in the glmQLFit between the two versions?

My code is as follows:

y <- calcNormFactors(y)
design <- model.matrix(~0+Group)
colnames(design) <- levels(Group)
y <- estimateDisp(y, design)
qlfit <- glmQLFit(y, design)
my.contrasts <- makeContrasts(
  D1.DvsD0.D= D1.D-D0.D,
  D2.DvsD1.D= D2.D-D1.D,levels=design)
lrt <- glmQLFTest(qlfit, contrast=my.contrasts[,"D1.DvsD0.D"])
d1<- topTags(lrt, n=Inf)

Version 3.12 result:
summary (dup<- decideTestsDGE(lrt)) 
   [,1]
-1 3566
0  6725
1  2714

Version 3.14 result:
summary (dup<- decideTestsDGE(lrt)) 
   [,1]
-1 4323
0  5535
1  3147

Any advice is much appreciated!

edger rnaseq glm glmfit • 5.3k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 6 hours ago
The city by the bay

It seems like you have a one-way layout, so you could use any of the three approaches. However, I would suggest that you use glmQLFit + glmQLFTest as they provide the most accurate type I error control. This is because they account for the uncertainty of the dispersion estimates (which can be considerable when you have few replicates and/or the amount of shrinkage is low), whereas the other methods do not.

As for the differences between versions, this is probably due to an error in the older version. Briefly, there is some code in glmQLFTest to put a lower bound on the p-value, under the assumption that genes should not exhibit below-Poisson variance. In the older version, this bound was not calculated correctly as the dispersion for the GLM in this part of the code was set to the trended dispersion rather than zero. This has been patched in the newer version, which should (correctly) give a smaller bound, greater power and more DE genes.

ADD COMMENT

Login before adding your answer.

Traffic: 1059 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6