normalization method & dispersion estimation RNA-seq data
4
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Is it possible to adopt the CQN normalization method (Hansen, 2012) as an option of the edgeR function 'calcNormFactors' ? And the new shrinkage estimator for dispersion (Wu, 2012) that seems to be better than the currently used by edgeR ? thanks, -- output of sessionInfo(): any -- Sent via the guest posting facility at bioconductor.org.
Normalization edgeR cqn Normalization edgeR cqn • 2.0k views
ADD COMMENT
0
Entering edit mode
@richard-friedman-513
Last seen 9.6 years ago
On Nov 26, 2012, at 12:30 PM, aec [guest] wrote: > > Is it possible to adopt the CQN normalization method (Hansen, 2012) as an option of the edgeR function 'calcNormFactors' ? And the new shrinkage estimator for dispersion (Wu, 2012) that seems to be better than the currently used by edgeR ? > > thanks, > > Dear Aec, Is it possible you mean this paper: Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates. Stat Appl Genet Mol Biol. 2012 Oct 22;11(5). doi:pii: /j/sagmb.2012.11.issue-5/1544-6115.1826/1544-6115.1826.xml. 10.1515/1544-6115.1826. PubMed PMID: 23104842. If not, please give the complete reference to the Wu paper. Thanks and best wishes, Rich Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet) Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman@cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ In memoriam, Ray Bradbury > -- output of sessionInfo(): > > any > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
I think he means A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. http://www.ncbi.nlm.nih.gov/pubmed/23001152 which outlines an empirical Bayes method to improve estimation of the Gamma parameters in the Gamma-Poisson (i.e. negative binomial) formulation of the count model. Other authors have proposed generalized Poisson, beta-binomial, Laplace mixture models, etc. for similar purposes, and Dr. Smyth has presented extensive empirical results for the existing edgeR formulations via Cox-Reid estimation along a sliding scale from "individual" to "completely shared" variance (Gamma). On the other hand, the authors (Hao Wu and Jean Wu, at least) are Hopkins alumni, if I'm not mistaken, and the corresponding author wrote the SQN package, so I can't imagine an implementation of DSS for use in edgeR is too terribly far off. However: We find that most of the improvement obtained in DSS is due to the different dispersion estimate, as passing the DSS estimates to edgeR/DESeq yields very similar results as in DSS (supplementary material available at Biostatistics online, Figure S9). Both the edgeR and the DESeq methods have been expanded to now accommodate multiclass comparisons. Our test is currently limited to two-class comparison and it is our immediate plan to extend the dispersion estimators to multifactor designs. With an estimate of the dispersion, one can use generalized linear models as done in McCarthy and others (2012). The paper is open-access and thus attached. Perhaps the authors (of any of the above works) can comment. On Mon, Nov 26, 2012 at 9:38 AM, Richard Friedman < friedman at cancercenter.columbia.edu> wrote: > > On Nov 26, 2012, at 12:30 PM, aec [guest] wrote: > > > > > > > Is it possible to adopt the CQN normalization method (Hansen, 2012) as > an option of the edgeR function 'calcNormFactors' ? And the new shrinkage > estimator for dispersion (Wu, 2012) that seems to be better than the > currently used by edgeR ? > > > > thanks, > > > > > > Dear Aec, > > Is it possible you mean this paper: > > Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting Differential > Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken > Dispersion Estimates. > Stat Appl Genet Mol Biol. 2012 Oct 22;11(5). doi:pii: > /j/sagmb.2012.11.issue-5/1544-6115.1826/1544-6115.1826.xml. > 10.1515/1544-6115.1826. PubMed PMID: 23104842. > If not, please give the complete reference to the Wu paper. > Thanks and best wishes, > Rich > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet) > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > In memoriam, Ray Bradbury > > > > > > > > -- output of sessionInfo(): > > > > any > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: Biostat-2012-Wu-biostatistics_kxs033.pdf Type: application/pdf Size: 1118372 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20121126="" 21c401eb="" attachment-0001.pdf="">
ADD REPLY
0
Entering edit mode
The supplement mentioned by the authors in their final paragraph is also attached (here). It is worth at least a glance. I wonder whether the notoriously anticonservative Wald test is responsible for recovering information lost to over-shrinkage. On Mon, Nov 26, 2012 at 10:20 AM, Tim Triche, Jr. <tim.triche at="" gmail.com="">wrote: > I think he means > > A new shrinkage estimator for dispersion improves differential expression > detection in RNA-seq data. > http://www.ncbi.nlm.nih.gov/pubmed/23001152 > > which outlines an empirical Bayes method to improve estimation of the > Gamma parameters in the Gamma-Poisson (i.e. negative binomial) formulation > of the count model. Other authors have proposed generalized Poisson, > beta-binomial, Laplace mixture models, etc. for similar purposes, and Dr. > Smyth has presented extensive empirical results for the existing edgeR > formulations via Cox-Reid estimation along a sliding scale from > "individual" to "completely shared" variance (Gamma). > > On the other hand, the authors (Hao Wu and Jean Wu, at least) are Hopkins > alumni, if I'm not mistaken, and the corresponding author wrote the SQN > package, so I can't imagine an implementation of DSS for use in edgeR is > too terribly far off. However: > > We find that most of the improvement obtained in DSS is due to the > different dispersion estimate, as > passing the DSS estimates to edgeR/DESeq yields very similar results as in > DSS (supplementary material > available at Biostatistics online, Figure S9). Both the edgeR and the > DESeq methods have been expanded > to now accommodate multiclass comparisons. Our test is currently limited > to two-class comparison and it > is our immediate plan to extend the dispersion estimators to multifactor > designs. With an estimate of the > dispersion, one can use generalized linear models as done in McCarthy and > others (2012). > > The paper is open-access and thus attached. Perhaps the authors (of any > of the above works) can comment. > > > On Mon, Nov 26, 2012 at 9:38 AM, Richard Friedman < > friedman at cancercenter.columbia.edu> wrote: > >> >> On Nov 26, 2012, at 12:30 PM, aec [guest] wrote: >> >> >> >> > >> > Is it possible to adopt the CQN normalization method (Hansen, 2012) as >> an option of the edgeR function 'calcNormFactors' ? And the new shrinkage >> estimator for dispersion (Wu, 2012) that seems to be better than the >> currently used by edgeR ? >> > >> > thanks, >> > >> > >> >> Dear Aec, >> >> Is it possible you mean this paper: >> >> Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting Differential >> Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken >> Dispersion Estimates. >> Stat Appl Genet Mol Biol. 2012 Oct 22;11(5). doi:pii: >> /j/sagmb.2012.11.issue-5/1544-6115.1826/1544-6115.1826.xml. >> 10.1515/1544-6115.1826. PubMed PMID: 23104842. >> If not, please give the complete reference to the Wu paper. >> Thanks and best wishes, >> Rich >> Richard A. Friedman, PhD >> Associate Research Scientist, >> Biomedical Informatics Shared Resource >> Herbert Irving Comprehensive Cancer Center (HICCC) >> Lecturer, >> Department of Biomedical Informatics (DBMI) >> Educational Coordinator, >> Center for Computational Biology and Bioinformatics (C2B2)/ >> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >> Room 824 >> Irving Cancer Research Center >> Columbia University >> 1130 St. Nicholas Ave >> New York, NY 10032 >> (212)851-4765 (voice) >> friedman at cancercenter.columbia.edu >> http://cancercenter.columbia.edu/~friedman/ >> >> In memoriam, Ray Bradbury >> >> >> >> >> >> >> > -- output of sessionInfo(): >> > >> > any >> > >> > -- >> > Sent via the guest posting facility at bioconductor.org. >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: kxs033supp.pdf Type: application/pdf Size: 1973404 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20121126="" 35e5b52a="" attachment-0001.pdf="">
ADD REPLY
0
Entering edit mode
It doesn't seem that the Wald test is compensating for anything, since the authors show that putting their dispersion estimates into edgeR causes it to produce nearly identical results. On 11/26/2012 10:24 AM, Tim Triche, Jr. wrote: > The supplement mentioned by the authors in their final paragraph is also > attached (here). It is worth at least a glance. I wonder whether the > notoriously anticonservative Wald test is responsible for recovering > information lost to over-shrinkage.
ADD REPLY
0
Entering edit mode
Good point -- thank you for catching that. On Mon, Nov 26, 2012 at 12:35 PM, Ryan C. Thompson <rct@thompsonclan.org>wrote: > It doesn't seem that the Wald test is compensating for anything, since the > authors show that putting their dispersion estimates into edgeR causes it > to produce nearly identical results. > > > > On 11/26/2012 10:24 AM, Tim Triche, Jr. wrote: > >> The supplement mentioned by the authors in their final paragraph is also >> attached (here). It is worth at least a glance. I wonder whether the >> notoriously anticonservative Wald test is responsible for recovering >> information lost to over-shrinkage. >> > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Tim, Thanks. Jean, Sorry. I thought aec might have meant Gordon's paper on QuasiSeq and there is was a paper by Wu and Smyth write after the QuasiSeq on Pubmed. I looked for a Wu Z paper but there were so many, Anyway, any thoughts about how the method cited compares with QuasiSeq would be appreciated. Best wishes, Rich On Nov 26, 2012, at 1:24 PM, Tim Triche, Jr. wrote: > The supplement mentioned by the authors in their final paragraph is also attached (here). It is worth at least a glance. I wonder whether the notoriously anticonservative Wald test is responsible for recovering information lost to over-shrinkage. > > > > On Mon, Nov 26, 2012 at 10:20 AM, Tim Triche, Jr. <tim.triche@gmail.com> wrote: > I think he means > > A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. > http://www.ncbi.nlm.nih.gov/pubmed/23001152 > > which outlines an empirical Bayes method to improve estimation of the Gamma parameters in the Gamma-Poisson (i.e. negative binomial) formulation of the count model. Other authors have proposed generalized Poisson, beta-binomial, Laplace mixture models, etc. for similar purposes, and Dr. Smyth has presented extensive empirical results for the existing edgeR formulations via Cox-Reid estimation along a sliding scale from "individual" to "completely shared" variance (Gamma). > > On the other hand, the authors (Hao Wu and Jean Wu, at least) are Hopkins alumni, if I'm not mistaken, and the corresponding author wrote the SQN package, so I can't imagine an implementation of DSS for use in edgeR is too terribly far off. However: > > We find that most of the improvement obtained in DSS is due to the different dispersion estimate, as > passing the DSS estimates to edgeR/DESeq yields very similar results as in DSS (supplementary material > available at Biostatistics online, Figure S9). Both the edgeR and the DESeq methods have been expanded > to now accommodate multiclass comparisons. Our test is currently limited to two-class comparison and it > is our immediate plan to extend the dispersion estimators to multifactor designs. With an estimate of the > dispersion, one can use generalized linear models as done in McCarthy and others (2012). > > The paper is open-access and thus attached. Perhaps the authors (of any of the above works) can comment. > > > On Mon, Nov 26, 2012 at 9:38 AM, Richard Friedman <friedman@cancercenter.columbia.edu> wrote: > > On Nov 26, 2012, at 12:30 PM, aec [guest] wrote: > > > > > > > Is it possible to adopt the CQN normalization method (Hansen, 2012) as an option of the edgeR function 'calcNormFactors' ? And the new shrinkage estimator for dispersion (Wu, 2012) that seems to be better than the currently used by edgeR ? > > > > thanks, > > > > > > Dear Aec, > > Is it possible you mean this paper: > > Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates. > Stat Appl Genet Mol Biol. 2012 Oct 22;11(5). doi:pii: > /j/sagmb.2012.11.issue-5/1544-6115.1826/1544-6115.1826.xml. > 10.1515/1544-6115.1826. PubMed PMID: 23104842. > If not, please give the complete reference to the Wu paper. > Thanks and best wishes, > Rich > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet) > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman@cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > In memoriam, Ray Bradbury > > > > > > > > -- output of sessionInfo(): > > > > any > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > A model is a lie that helps you see the truth. > > Howard Skipper > > > > > -- > A model is a lie that helps you see the truth. > > Howard Skipper > > <kxs033supp.pdf> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 8 months ago
Scripps Research, La Jolla, CA
Anyway, to address the question of using DSS to estimate dispersions and then plugging those dispersions into an edgeR DGEList object, this should be perfectly possible, with one LARGE caveat: DSS only supports the simplest possible experimental design: two-classes, unpaired samples. If your experiment fits this design, you can use DSS to estimate dispersions and copy those dispersions into a DGEList object and use edgeR's significance tests. However, doing so would not necessarily be useful because, as discussed, the DSS paper shows that doing so would give basically the same results as using the waldTest function of DSS. On 11/26/2012 09:30 AM, aec [guest] wrote: > Is it possible to adopt the CQN normalization method (Hansen, 2012) as an option of the edgeR function 'calcNormFactors' ? And the new shrinkage estimator for dispersion (Wu, 2012) that seems to be better than the currently used by edgeR ? > > thanks, > > > -- output of sessionInfo(): > > any > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Zhijin Wu ▴ 260
@zhijin-wu-2378
Last seen 9.6 years ago
For normalization with CQN, Kasper (CQN maintainer) has example code how to import the offset into edgeR (http://www.bioconductor.org/packages/release/bioc/vignettes/cqn/inst/ doc/cqn.pdf). The new shrinkage estimate provided by DSS can also be used to replace the estimate edgeR and Hao Wu (the author of the DSS pacakge) will add the example code to the vignette (soon hopefully, when he gets back from vacation). Whether these two will be included as options of functions within edgeR can only be determined by the edgeR maintainers (Mark, Davis, Yunshun and Gordon) Jean Wu On 11/26/2012 12:30 PM, aec [guest] wrote: > > Is it possible to adopt the CQN normalization method (Hansen, 2012) as an option of the edgeR function 'calcNormFactors' ? And the new shrinkage estimator for dispersion (Wu, 2012) that seems to be better than the currently used by edgeR ? > > thanks, > > > -- output of sessionInfo(): > > any > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
Dear Anna, > Date: Mon, 26 Nov 2012 09:30:19 -0800 (PST) > From: "aec [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, aesteve at pcb.ub.es > Subject: [BioC] normalization method & dispersion estimation RNA-seq > data > > Is it possible to adopt the CQN normalization method (Hansen, 2012) as > an option of the edgeR function 'calcNormFactors' ? No it isn't possible, because calcNormFactors() implements scale normalization methods, and cqn is not of this type. But why do you need this anyway? The cqn package has always worked with edgeR, and the cqn package provides code examples of how to do this. What are you looking for that is not already provided? > And the new shrinkage estimator for dispersion (Wu, 2012) that seems to > be better than the currently used by edgeR ? It is inevitable that each new paper that is published claims to have to best method. In our own (unpublished) simulations with the DSS package that goes with Wu et al (Biostatistics, 2012), we find that it is similar in performance to DESeq, but worse than BBSeq, PoissonSeq, BaySeq, voom and edgeR, the latter two being the best. Of course DSS may do better in other simulation scenarios, and it may have been improved since our simulations were done in April 2012. I don't expect you to believe this until we publish our results, but it is not my intention to change the methods used in edgeR with every new published paper. Best wishes Gordon > thanks, > > > -- output of sessionInfo(): > > any ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6