Drastic change in edgeR p-value calculation?
2
0
Entering edit mode
@lionel-lee-brooks-3rd-5660
Last seen 9.6 years ago
Hello Gentlepeople, I have been using the edgeR package to identify differentially abundant tags in an RNA-seq experiment. I was happy when the edgeR package version 2.4.6 called 41 differentially abundant tags. However, now I am unhappy because the same analysis with edgeR package version 3.0.2 does not call any differentially abundant tags. I've compared the output of the two versions and I can see that both versions calculate the same fold changes - however - the p-value calculation is different. I feel that the optimal solution for me is to determine the command arguments that I can use to emulate the previous default behavior. Can anyone help me figure that out? I would be extremely grateful for some guidance. This way, I could keep my bioconductor packages up to date. If you think you can help then please see below, for the code and session Info. Thank you for your time. sincerely, Lee Brooks ---- HERE IS THE CODE PART Here is the procedure that I have been using, where x is the data frame object containing my data: y <- DGEList(counts=x,group=group) y <- calcNormFactors(y) y <- estimateCommonDisp(y) y <- estimateTagwiseDisp(y) et <- exactTest(y) topTags(et) de <- decideTestsDGE(et, p=0.05, adjust="BH") detags <- et[as.logical(de)] detags <- topTags(detags, n = length(detags)) ---- HERE IS AN EXAMPLE OUTPUT FROM EACH VERSION: Tables Tables logFC logCPM PValue FDR fooTag_edgeR_2.4.6 5.12318 12.6577 2.9E-30 8.9E-26 fooTag_edgeR_3.0.2 5.12318 12.6612 0.00717 1 ----- HERE IS THE sessionInfo() PART... THE CONFIGURATION THAT WORKED FOR ME: R version 2.14.1 (2011-12-22) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.4.6 limma_3.10.3 THE CONFIGURATION THAT DOES NOT WORK FOR ME: R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.0.2 limma_3.14.1 [[alternative HTML version deleted]]
edgeR edgeR • 1.4k views
ADD COMMENT
0
Entering edit mode
KJ Lim ▴ 420
@kj-lim-5288
Last seen 3.6 years ago
Finland
Dear Lee, Hopefully this archive thread is useful to your case: Differences in results between two different versions of edgeR https://stat.ethz.ch/pipermail/bioconductor/2012-December/049644.html Best regards, KJ Lim On 18 December 2012 20:35, Lionel (Lee) Brooks 3rd < Lionel.Brooks.GR@dartmouth.edu> wrote: > Hello Gentlepeople, > > I have been using the edgeR package to identify differentially abundant > tags in an RNA-seq experiment. > > I was happy when the edgeR package version 2.4.6 called 41 differentially > abundant tags. > > However, now I am unhappy because the same analysis with edgeR package > version 3.0.2 does not call any differentially abundant tags. > > I've compared the output of the two versions and I can see that both > versions > calculate the same fold changes - however - the p-value calculation is > different. > > I feel that the optimal solution for me is to determine the command > arguments > that I can use to emulate the previous default behavior. > > Can anyone help me figure that out? > I would be extremely grateful for some guidance. > This way, I could keep my bioconductor packages up to date. > > If you think you can help then please see below, for the code and > session Info. > > Thank you for your time. > > sincerely, > Lee Brooks > > ---- > HERE IS THE CODE PART > > Here is the procedure that I have been using, > where x is the data frame object containing my data: > y <- DGEList(counts=x,group=group) > y <- calcNormFactors(y) > y <- estimateCommonDisp(y) > y <- estimateTagwiseDisp(y) > et <- exactTest(y) > topTags(et) > de <- decideTestsDGE(et, p=0.05, adjust="BH") > detags <- et[as.logical(de)] > detags <- topTags(detags, n = length(detags)) > > ---- > HERE IS AN EXAMPLE OUTPUT FROM EACH VERSION: > > Tables Tables > > logFC logCPM PValue FDR > fooTag_edgeR_2.4.6 5.12318 12.6577 2.9E-30 > 8.9E-26 > fooTag_edgeR_3.0.2 5.12318 12.6612 0.00717 1 > > > ----- > HERE IS THE sessionInfo() PART... > > THE CONFIGURATION THAT WORKED FOR ME: > > R version 2.14.1 (2011-12-22) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_2.4.6 limma_3.10.3 > > THE CONFIGURATION THAT DOES NOT WORK FOR ME: > > R version 2.15.2 (2012-10-26) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.0.2 limma_3.14.1 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
Dear Lee Brooks, The current version will behave almost the same as the older version if you set the prior.df to the old default value: estimateTagwiseDisp(y, prior.df=20) It seems that you have some genes with large fold changes but also large variability between replicates. With a large prior.df, the dispersions for these genes are forced down towards the global trend, and this is apparently enough to allow them to be statistically significant. With a smaller prior.df, the tagwise dispersions remain closer to individual dispersion values, so the genes are penalized more for their large variability, and they become non-significant. Best wishes Gordon > Date: Tue, 18 Dec 2012 13:35:11 -0500 > From: "Lionel (Lee) Brooks 3rd" <lionel.brooks.gr at="" dartmouth.edu=""> > To: bioconductor at r-project.org > Subject: [BioC] Drastic change in edgeR p-value calculation? > > Hello Gentlepeople, > > I have been using the edgeR package to identify differentially abundant > tags in an RNA-seq experiment. > > I was happy when the edgeR package version 2.4.6 called 41 > differentially abundant tags. > > However, now I am unhappy because the same analysis with edgeR package > version 3.0.2 does not call any differentially abundant tags. > > I've compared the output of the two versions and I can see that both > versions calculate the same fold changes - however - the p-value > calculation is different. > > I feel that the optimal solution for me is to determine the command > arguments that I can use to emulate the previous default behavior. > > Can anyone help me figure that out? > I would be extremely grateful for some guidance. > This way, I could keep my bioconductor packages up to date. > > If you think you can help then please see below, for the code and > session Info. > > Thank you for your time. > > sincerely, > Lee Brooks > > ---- > HERE IS THE CODE PART > > Here is the procedure that I have been using, > where x is the data frame object containing my data: > y <- DGEList(counts=x,group=group) > y <- calcNormFactors(y) > y <- estimateCommonDisp(y) > y <- estimateTagwiseDisp(y) > et <- exactTest(y) > topTags(et) > de <- decideTestsDGE(et, p=0.05, adjust="BH") > detags <- et[as.logical(de)] > detags <- topTags(detags, n = length(detags)) > > ---- > HERE IS AN EXAMPLE OUTPUT FROM EACH VERSION: > > Tables Tables > > logFC logCPM PValue FDR > fooTag_edgeR_2.4.6 5.12318 12.6577 2.9E-30 8.9E-26 > fooTag_edgeR_3.0.2 5.12318 12.6612 0.00717 1 > > > ----- > HERE IS THE sessionInfo() PART... > > THE CONFIGURATION THAT WORKED FOR ME: > > R version 2.14.1 (2011-12-22) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_2.4.6 limma_3.10.3 > > THE CONFIGURATION THAT DOES NOT WORK FOR ME: > > R version 2.15.2 (2012-10-26) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.0.2 limma_3.14.1 > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6