Question

edgeR:Differences in results between two different versions of edgeR

0

Entering edit mode

Dorota Herman ▴ 60

@dorota-herman-5627

Last seen 9.6 years ago

Dear list, when I run the same code for RNA-seq data to find differentially expressed genes using exactTest() in two different versions of edgeR, I obtain considerable different results. The data set contains 36 libraries divided into 12 groups, where each library is consist of 24 000 genes (none of them has all zero counts). While the older version (edgeR_2.0.5) gives me 97 significantly differentially expressed genes between two selected groups, the newer version (edgeR_3.0.4) does not find any significantly differentially expressed genes; moreover FDR is less than 1 only for 13 genes. I realize these two versions are far from each other in their developmental process. However, I would be still interested in reasons of such a difference. Running in parallel the same code in two different versions of edgeR, I find out that it is most likely attributed by the estimateTagwiseDisp() function, which are estimateTagwiseDisp(object, prior.n=10, trend=FALSE, prop.used=NULL, tol=1e-06, grid=TRUE, grid.length=200, verbose=TRUE) in edgeR_2.0.5 and estimateTagwiseDisp(object, prior.df=20, trend="movingave", span=NULL, method="grid", grid.length=11, grid.range=c(-6,6), tol=1e-06, verbose=FALSE) in edgeR_3.0.4 The greatest impact seems to have parameters prior.n prior.df as their settings say how much we want our tagwise dispersion be influenced by a common dispersion. Although setting a prior.df to very low (that would be an equivalent of a high prior.n) makes a difference in FDR values, the results from two different edgeR versions are still very distinct, so are estimated $tagwise.disperion parameters . Another candidate parameter for changes seems to be the prop.used but I am not sure if its equivalent in edgeR_3.0.4 is ?span? parameter, is it? On the other hand there are parameters related to the estimation algorithm, that I would not expect to cause such a difference in the further outcome, could they? What am I missing here? Settings of which parameter would make outcomes of DE genes analyses more comparable between two different edgeR versions? Best wishes Dorota -- ================================================================== Dorota Herman, PhD VIB Department of Plant Systems Biology, Ghent University Technologiepark 927 9052 Gent, Belgium Tel: +32 (0)9 3313692 Email:dorota.herman at psb.vib-ugent.be Web: http://www.psb.ugent.be

PROcess edgeR PROcess edgeR • 1.4k views

ADD COMMENT • link updated 11.4 years ago by Gordon Smyth 50k • written 11.4 years ago by Dorota Herman ▴ 60

score 0 · Answer 1 · 2012-12-05

Dear Dorota, The important settings are prior.df and trend. prior.n and prior.df are related through prior.df = prior.n * residual.df, and your experiment has residual.df = 36 - 12 = 24. So the old setting of prior.n=10 is equivalent for your data to prior.df = 240, a very large value. Going the other way, the new setting of prior.df=10 is equivalent to prior.n=10/24. To recover old results with the current software you would use estimateTagwiseDisp(object, prior.df=240, trend="none") To get the new default from old software you would use estimateTagwiseDisp(object, prior.n=10/24, trend=TRUE) Actually the old trend method is equivalent to trend="loess" in the new software. You should use plotBCV(object) to see whether a trend is required. Note you could also use prior.n <- getPriorN(object, prior.df=10) to map between prior.df and prior.n. There has also been a change in the default behaviour of exactTest(). To make the new exactTest() behave like the old version, you would use exactTest(object, rejection.region="smallp") The new default gives much more reliable results than the old when the dispersion is very large. Best wishes Gordon > Date: Mon, 03 Dec 2012 19:36:58 +0100 > From: "Dorota Herman" <dorota.herman at="" psb.vib-ugent.be=""> > To: Bioconductor mailing list <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR:Differences in results between two different > versions of edgeR > > Dear list, > > when I run the same code for RNA-seq data to find differentially > expressed genes using exactTest() in two different versions of edgeR, I > obtain considerable different results. The data set contains 36 > libraries divided into 12 groups, where each library is consist of 24 > 000 genes (none of them has all zero counts). While the older version > (edgeR_2.0.5) gives me 97 significantly differentially expressed genes > between two selected groups, the newer version (edgeR_3.0.4) does not > find any significantly differentially expressed genes; moreover FDR is > less than 1 only for 13 genes. I realize these two versions are far from > each other in their developmental process. However, I would be still > interested in reasons of such a difference. > > Running in parallel the same code in two different versions of edgeR, I > find out that it is most likely attributed by the estimateTagwiseDisp() > function, which are > > estimateTagwiseDisp(object, prior.n=10, trend=FALSE, prop.used=NULL, > tol=1e-06, grid=TRUE, grid.length=200, verbose=TRUE) in edgeR_2.0.5 > > and > > estimateTagwiseDisp(object, prior.df=20, trend="movingave", span=NULL, > method="grid", grid.length=11, grid.range=c(-6,6), tol=1e-06, > verbose=FALSE) in edgeR_3.0.4 > > The greatest impact seems to have parameters prior.n prior.df as their > settings say how much we want our tagwise dispersion be influenced by a > common dispersion. Although setting a prior.df to very low (that would > be an equivalent of a high prior.n) makes a difference in FDR values, > the results from two different edgeR versions are still very distinct, > so are estimated $tagwise.disperion parameters . Another candidate > parameter for changes seems to be the prop.used but I am not sure if its > equivalent in edgeR_3.0.4 is ?span? parameter, is it? On the other hand > there are parameters related to the estimation algorithm, that I would > not expect to cause such a difference in the further outcome, could > they? > > What am I missing here? Settings of which parameter would make outcomes > of DE genes analyses more comparable between two different edgeR > versions? > > Best wishes > Dorota > > > ================================================================== > Dorota Herman, PhD > VIB Department of Plant Systems Biology, Ghent University > Technologiepark 927 > 9052 Gent, Belgium > Tel: +32 (0)9 3313692 > Email:dorota.herman at psb.vib-ugent.be > Web: http://www.psb.ugent.be ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}