Question

How to get DE relative to a fold-change threshold in single factor experiments

0

Entering edit mode

thkapell ▴ 10

@tkapell-14647

Last seen 15 months ago

Helmholtz Center Munich, Germany

I have a 4 level single factor dataset to analyse with edgeR. I used:

tmm<-calcNormFactors(data.dge)

y<-estimateDisp(tmm)

et<-exactTest(y,pair())

to extract DE genes in the desired comparisons. However, is there an equivalent to glmTreat() in multiple factor experiments to

use to get DE genes relative to a fold-change of 2 or similar?

edgeR deanalysis test fisher exact • 1.3k views

ADD COMMENT • link updated 6.2 years ago by Aaron Lun ★ 28k • written 6.2 years ago by thkapell ▴ 10

score 0 · Answer 1 · 2018-03-08

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 9 hours ago

The city by the bay

If you bothered to read the documentation, you would see that glmTreat allows you to specify contrasts. So you can specify the desired contrasts between pairs of groups, in the same manner that you might do for glmLRT and glmQLFTest. Read Sections 3.2.3 and 3.3 of the edgeR user's guide for more details.

ADD COMMENT • link 6.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you for the material. I have read the edgeR user's guide and (forgive me if I got it wrong) I understood that:

for single factor designs, the exactTest should be used while for multiple factor designs glmFit/glmLRT and more precisely glmQLFit/glmQLFtest should be preferred. For the multiple factor designs, I am with you with regard to how glmTreat shall be used. But for single factor experiments, glmTreat can not be applied as the following error comes up:

et<-exactTest(y,pair=c(A,B))
tr<-glmTreat(et,contrast=BvsA,lfc=log2(2))

Error in glmTreat(et, contrast = BvsA, lfc = log2(2)) : 
  glmfit must be an DGEGLM object (usually produced by glmFit or glmQLFit).

So this is what confuses me.

ADD REPLY • link 6.2 years ago thkapell ▴ 10

1

Entering edit mode

There is no real reason to use the exact test any more. The GLM machinery is far more flexible as well as being more accurate, see A: EDGE-R exact test vs QL F-test. As for your error - well, the error message says it all. The input object should be produced by glmFit or glmQLFit, not exactTest.

ADD REPLY • link 6.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you, that was really helpful! Would you then recommend a cutoff/threshold for dispersion or counts when you would use glmFit instead of glmQLFit for your DE analysis?

ADD REPLY • link 6.2 years ago thkapell ▴ 10

0

Entering edit mode

You should do filtering on abundance, see Section 2.6 of the user's guide. I don't know what you mean by applying a threshold on the dispersion; any filtering on the dispersion is a Bad Idea for empirical Bayes shrinkage. If you're worried about outlier dispersions, set robust=TRUE in glmQLFit.

ADD REPLY • link 6.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

"In summary, while both of the methods will work for your data set, the QL F-test is probably the better choice. There are some situations where the QL F-test doesn't work well - for example, if you don't have replicates, you'd have to supply a fixed dispersion, which defeats the whole point of modelling estimation uncertainty. Another situation is where the dispersions are very large and the counts are very small, whereby some of the approximations in the QL framework seem to fail. In such cases, I usually switch to the LRT rather than using the exact test, for the reasons of experimental flexibility that I mentioned above."

ADD REPLY • link 6.2 years ago thkapell ▴ 10

0

Entering edit mode

Yes, I remember writing that. What is your point? Putting things in bold doesn't provide any extra information.

ADD REPLY • link 6.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Yes, that was from the link you posted above. You said that glmQLFit approximations fail with small counts and large dispersions. Can you then elaborate when you would switch to glmFit based on this?

ADD REPLY • link 6.2 years ago thkapell ▴ 10

1

Entering edit mode

For single-cell data. This is not relevant for bulk unless your data is very bad (low depth, high variance).