Question

edgeR analysis in diffbind

0

Entering edit mode

tonja.r ▴ 80

@tonjar-7565

Last seen 7.5 years ago

United Kingdom

I am trying to reproduce the results of DiffBind for edgeR analysis.

A member of my research group did analysis with DiffBind and got nice results, whereas me, performing the edgeR separately could not get any results.

DiffBind was called as follows:

WNN_vs_WEN= dba.analyze(WNN_vs_WEN)

According to DiffBind protocol:
Next comes a call to edgeR ’s DGEList function. The DGEList object that results is next passed to calcNormFactors with method="TMM" and doWeighting=FALSE, returning an updated DGEList object. This is passed to estimateCommonDisp with default parameters.

My edgeR equivalent would be the following: (the manual of DiffBind is somehow really weird regarding edgeR description)
WEN -> treated, WNN -> untreated

cond = rep(c("WEN","WNN"), each=cols/2)
condition <- relevel(factor(cond), ref="WNN")
y =DGEList(signal,group=condition)
y <- calcNormFactors(y,method="TMM",doWeighting =FALSE)
y = estimateCommonDisp(y)

But what is done next? In DiffBind they switch then to "If the method is DBA_EDGER_CLASSIC" and  "If the method is DBA_EDGER_GLM (the default)". So, if DBA_EDGER_GLM is default, then an estimateGLMCommonDisp  should be called. But we have already called estimateCommonDisp. Also, at the end they state: Finally, an exactTest ([6]) is performed, using either common or tagwise dispersion depending on the value specified for bTagwise. But exactTest does not give me adjusted p-values and a member of my group got FDRs.

Is there a manual where the steps of edgeR in DiffBind are described better?

edger diffbind • 1.8k views

ADD COMMENT • link updated 8.4 years ago by Rory Stark ★ 5.2k • written 8.4 years ago by tonja.r ▴ 80

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 5 hours ago

The city by the bay

Rory will probably give a better answer, but here's some comments from the edgeR side:

It doesn't matter if you've already called estimateCommonDisp before calling estimateGLMCommonDisp, the common dispersion estimate from the former will just get overwritten by that of the latter. It'll be the same as if you called the the GLM method by itself (more or less - some extra fields get added by estimateCommonDisp, but these should be ignored in a GLM analysis).
It's fairly straightforward to compute adjusted p-values to control the FDR from raw p-values. topTags will do it for you, but you can also use the p.adjust method in base R. So, this step is not really specific to edgeR. In general, I don't think it's something that needs to be described in extensive detail.

Obviously, make sure you're running the same version numbers as your colleague.

ADD COMMENT • link 8.4 years ago Aaron Lun ★ 28k

0

Entering edit mode

Well, I still do no understand if the use exactTest or glmFit,glmLRT in the standard dba.analyze call.

ADD REPLY • link 8.4 years ago tonja.r ▴ 80

score 2 · Accepted Answer · 2015-12-06

Hi Tonja-

The exact usage of edgeR depends very much on the version; the technical note in the vignette should match however.

In the current released version (not the development version), the key edgeR calls are as follows (for a single-factor analysis):

> DE <- calcNormFactors(DE,method="TMM")
> DE <- estimateCommonDisp(DE)
> DE <- estimateGLMCommonDisp(DE,DE$design)

If bTagwise=TRUE (not default), then it also calls:

> DE <- estimateGLMTagwiseDisp(DE,DE$design)

The model fit and tests are as follows:

> DE$GLM <- glmFit(DE, DE$design) > DE$LRT <- glmLRT(DE$GLM, 2)

When generating a report, the calls are:

> report <- topTags(DE$LRT, nrow(counts))$table

As Aaron says, topTags does the FDR calculations.

The counting method may also impact the result: are you running edgeR using the counts retrieved from the DBA object (using dba.peakset() with bRetrieve=TRUE), or doing the counting separately? DiffBind defaults do non-standard things with duplicates, control tracks, and setting the library size.

Cheers-

Rory