necessity of moderated t statistic and false discoveries for small predefined gene list?

0

Entering edit mode

Richard Friedman ★ 2.0k

@richard-friedman-513

Last seen 9.6 years ago

Dear Bioconductor List. I am using Limma to analyze differential expression between 2 conditions on an Affy chip. My experimental collaborator asks for the differential expression of 10 predefined genes. A, Should I correct for false discoveries based upon all of the genes on the chip? B. If not, should I correct for false discoveries just for the probeids for the 10 predefined genes? C. Should I use the moderated t-statistic or just use an unmoderated t- test for those 10 genes. Thanks and best wishes, Rich ------------------------------------------------------------ Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet) Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman at cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ "School is an evil plot to suppress my individuality" Rose Friedman, age15

Cancer affy limma Cancer affy limma • 1.8k views

ADD COMMENT • link updated 11.9 years ago by Moshe Olshansky ▴ 260 • written 11.9 years ago by Richard Friedman ★ 2.0k

0

Entering edit mode

Moshe Olshansky ▴ 260

@moshe-olshansky-4491

Last seen 9.6 years ago

Hi Rich, Whether to use the moderated t-statistic or not does not depend on whether you are interested in the 10 particular genes or in all differentially expressed ones. This will affect your multiple testing adjustment. The simplest way for you to proceed is to use limma as usual, get the topTable but then take the UNADJUSTED p-values for your 10 genes of interest and use the p.adjust function to adjust for multiple testing if you wish. In any case you should also look at (log)Fold Changes. Best regards, Moshe. > Dear Bioconductor List. > > I am using Limma to analyze differential expression between 2 > conditions on an Affy chip. > My experimental collaborator asks for the differential expression of > 10 predefined genes. > > A, Should I correct for false discoveries based upon all of the genes > on the chip? > B. If not, should I correct for false discoveries just for the > probeids for the 10 predefined > genes? > C. Should I use the moderated t-statistic or just use an unmoderated t- > test for those 10 > genes. > > Thanks and best wishes, > Rich > ------------------------------------------------------------ > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet) > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > "School is an evil plot to suppress my individuality" > > Rose Friedman, age15 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Moshe Olshansky Division of Bioinformatics The Walter & Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville, Vic 3052 e-mail: olshansky at wehi.edu.au tel: (03) 9345 2849 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 11.9 years ago Moshe Olshansky ▴ 260

0

Entering edit mode

Moshe and List, Thanks for yoru reply. The method you describe retains the raw p-value based on the moderated t-statistic and adjusts it to give an adjusted p-value (usually a false discovery rate). However, as I understand it, the moderated t-statistic given by Limma based on all of the genes in the array, pools variance information to moderate the standard deviation to prevent fortuitously low p-values stemming from fortuitously low standard deviations encountered in thousands of multiple tests.I am wondering that if the experimentalist asks me to look up just 10 genes I should use the unmoderated frequentist t-statistic which will differ from the one in Limma and may imply significance where Limma does not. I guess another way to phrase it is "How many simulataneous tests does one need before one should prefer the moderated statistic to the empirical Bayesian one". Or should I fit just those 10 genes (~30 affy probes) with Limma? Best wishes, Rich On Thu, 17 May 2012, Moshe Olshansky wrote: > Hi Rich, > > Whether to use the moderated t-statistic or not does not depend on whether > you are interested in the 10 particular genes or in all differentially > expressed ones. This will affect your multiple testing adjustment. > The simplest way for you to proceed is to use limma as usual, get the > topTable but then take the UNADJUSTED p-values for your 10 genes of > interest and use the p.adjust function to adjust for multiple testing if > you wish. In any case you should also look at (log)Fold Changes. > > Best regards, > Moshe. > > >> Dear Bioconductor List. >> >> I am using Limma to analyze differential expression between 2 >> conditions on an Affy chip. >> My experimental collaborator asks for the differential expression of >> 10 predefined genes. >> >> A, Should I correct for false discoveries based upon all of the genes >> on the chip? >> B. If not, should I correct for false discoveries just for the >> probeids for the 10 predefined >> genes? >> C. Should I use the moderated t-statistic or just use an unmoderated t- >> test for those 10 >> genes. >> >> Thanks and best wishes, >> Rich >> ------------------------------------------------------------ >> Richard A. Friedman, PhD >> Associate Research Scientist, >> Biomedical Informatics Shared Resource >> Herbert Irving Comprehensive Cancer Center (HICCC) >> Lecturer, >> Department of Biomedical Informatics (DBMI) >> Educational Coordinator, >> Center for Computational Biology and Bioinformatics (C2B2)/ >> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >> Room 824 >> Irving Cancer Research Center >> Columbia University >> 1130 St. Nicholas Ave >> New York, NY 10032 >> (212)851-4765 (voice) >> friedman at cancercenter.columbia.edu >> http://cancercenter.columbia.edu/~friedman/ >> >> "School is an evil plot to suppress my individuality" >> >> Rose Friedman, age15 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- ------------------------------------------------------------ Richard A. Friedman, PhD Associate Research Scientist Herbert Irving Comprehensive Cancer Center Biomedical Informatics Shared Resource Lecturer Department of Biomedical Informatics Box 95, Room 130BB or P&S 1-420C Columbia University Medical Center 630 W. 168th St. New York, NY 10032 (212)305-6901 (5-6901) (voice) friedman at cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ "The last 250 pages of the last Harry Potter book took place in one day because alot happened in that day. All of Ulysses takes place in one day and nothing happened in that day." -Rose Friedman, age 11

ADD REPLY • link 11.9 years ago Richard Friedman ★ 2.0k

0

Entering edit mode

Hi Rich, I think that Gordon Smyth (the author of limma) has explained at this list what moderated t-statistic is. The brief explanation is that when there are few samples the estimate of the variance which is used in a standard t-test is quite noisy and because one must account for this noise the standard t-test has a low statistical power. The Empirical Bayes model used in the moderated t-tests allows to estimate the variance with more confidence and therefore has a better power. So it can be used even if you are interested in just a few genes. It has (almost) nothing to do with the multiple testing adjustment. Well, one may ask whether moderated p-values satisfy the assumptions of multiple testing adjustment procedures (in particular the BH), but this is another story. May be Gordon will comment on this. Best regards, Moshe. > Moshe and List, > > Thanks for yoru reply. The method you describe retains > the raw p-value based on the moderated t-statistic and adjusts > it to give an adjusted p-value (usually a false discovery rate). > However, as I understand it, the moderated > t-statistic given by Limma based on > all of the genes in the array, pools variance information > to moderate the standard deviation to prevent fortuitously > low p-values stemming from fortuitously low standard deviations > encountered in thousands of multiple tests.I am wondering > that if the experimentalist asks me to look up just 10 genes > I should use the unmoderated frequentist t-statistic which > will differ from the one in Limma and may imply significance > where Limma does not. I guess another way to phrase it is > "How many simulataneous tests does one need before one > should prefer the moderated statistic to the empirical > Bayesian one". Or should I fit just those 10 genes > (~30 affy probes) with Limma? > > Best wishes, > Rich > > > > On Thu, 17 May 2012, Moshe Olshansky wrote: > >> Hi Rich, >> >> Whether to use the moderated t-statistic or not does not depend on >> whether >> you are interested in the 10 particular genes or in all differentially >> expressed ones. This will affect your multiple testing adjustment. >> The simplest way for you to proceed is to use limma as usual, get the >> topTable but then take the UNADJUSTED p-values for your 10 genes of >> interest and use the p.adjust function to adjust for multiple testing if >> you wish. In any case you should also look at (log)Fold Changes. >> >> Best regards, >> Moshe. >> >> >>> Dear Bioconductor List. >>> >>> I am using Limma to analyze differential expression between 2 >>> conditions on an Affy chip. >>> My experimental collaborator asks for the differential expression of >>> 10 predefined genes. >>> >>> A, Should I correct for false discoveries based upon all of the genes >>> on the chip? >>> B. If not, should I correct for false discoveries just for the >>> probeids for the 10 predefined >>> genes? >>> C. Should I use the moderated t-statistic or just use an unmoderated t- >>> test for those 10 >>> genes. >>> >>> Thanks and best wishes, >>> Rich >>> ------------------------------------------------------------ >>> Richard A. Friedman, PhD >>> Associate Research Scientist, >>> Biomedical Informatics Shared Resource >>> Herbert Irving Comprehensive Cancer Center (HICCC) >>> Lecturer, >>> Department of Biomedical Informatics (DBMI) >>> Educational Coordinator, >>> Center for Computational Biology and Bioinformatics (C2B2)/ >>> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >>> Room 824 >>> Irving Cancer Research Center >>> Columbia University >>> 1130 St. Nicholas Ave >>> New York, NY 10032 >>> (212)851-4765 (voice) >>> friedman at cancercenter.columbia.edu >>> http://cancercenter.columbia.edu/~friedman/ >>> >>> "School is an evil plot to suppress my individuality" >>> >>> Rose Friedman, age15 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> > > -- > ------------------------------------------------------------ > Richard A. Friedman, PhD > Associate Research Scientist > Herbert Irving Comprehensive Cancer Center > Biomedical Informatics Shared Resource > Lecturer > Department of Biomedical Informatics > Box 95, Room 130BB or P&S 1-420C > Columbia University Medical Center > 630 W. 168th St. > New York, NY 10032 > (212)305-6901 (5-6901) (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > "The last 250 pages of the last Harry Potter > book took place in one day because alot > happened in that day. All of Ulysses takes > place in one day and nothing happened in that day." > -Rose Friedman, age 11 > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 11.9 years ago Moshe Olshansky ▴ 260

0

Entering edit mode

Moshe, Thank you for the clarification on the moderated t-statistic. If I am only interested in 10 genes is it better to calculate the moderated statistic and hence raw p-values based on all of the genes on the array or just thoe 10 genes? Best wishes, Rich On May 17, 2012, at 12:35 AM, Moshe Olshansky wrote: > Hi Rich, > > I think that Gordon Smyth (the author of limma) has explained at > this list > what moderated t-statistic is. > The brief explanation is that when there are few samples the > estimate of > the variance which is used in a standard t-test is quite noisy and > because > one must account for this noise the standard t-test has a low > statistical > power. The Empirical Bayes model used in the moderated t-tests > allows to > estimate the variance with more confidence and therefore has a better > power. So it can be used even if you are interested in just a few > genes. > It has (almost) nothing to do with the multiple testing adjustment. > Well, > one may ask whether moderated p-values satisfy the assumptions of > multiple > testing adjustment procedures (in particular the BH), but this is > another > story. May be Gordon will comment on this. > > Best regards, > Moshe. > >> Moshe and List, >> >> Thanks for yoru reply. The method you describe retains >> the raw p-value based on the moderated t-statistic and adjusts >> it to give an adjusted p-value (usually a false discovery rate). >> However, as I understand it, the moderated >> t-statistic given by Limma based on >> all of the genes in the array, pools variance information >> to moderate the standard deviation to prevent fortuitously >> low p-values stemming from fortuitously low standard deviations >> encountered in thousands of multiple tests.I am wondering >> that if the experimentalist asks me to look up just 10 genes >> I should use the unmoderated frequentist t-statistic which >> will differ from the one in Limma and may imply significance >> where Limma does not. I guess another way to phrase it is >> "How many simulataneous tests does one need before one >> should prefer the moderated statistic to the empirical >> Bayesian one". Or should I fit just those 10 genes >> (~30 affy probes) with Limma? >> >> Best wishes, >> Rich >> >> >> >> On Thu, 17 May 2012, Moshe Olshansky wrote: >> >>> Hi Rich, >>> >>> Whether to use the moderated t-statistic or not does not depend on >>> whether >>> you are interested in the 10 particular genes or in all >>> differentially >>> expressed ones. This will affect your multiple testing adjustment. >>> The simplest way for you to proceed is to use limma as usual, get >>> the >>> topTable but then take the UNADJUSTED p-values for your 10 genes of >>> interest and use the p.adjust function to adjust for multiple >>> testing if >>> you wish. In any case you should also look at (log)Fold Changes. >>> >>> Best regards, >>> Moshe. >>> >>> >>>> Dear Bioconductor List. >>>> >>>> I am using Limma to analyze differential expression between 2 >>>> conditions on an Affy chip. >>>> My experimental collaborator asks for the differential >>>> expression of >>>> 10 predefined genes. >>>> >>>> A, Should I correct for false discoveries based upon all of the >>>> genes >>>> on the chip? >>>> B. If not, should I correct for false discoveries just for the >>>> probeids for the 10 predefined >>>> genes? >>>> C. Should I use the moderated t-statistic or just use an >>>> unmoderated t- >>>> test for those 10 >>>> genes. >>>> >>>> Thanks and best wishes, >>>> Rich >>>> ------------------------------------------------------------ >>>> Richard A. Friedman, PhD >>>> Associate Research Scientist, >>>> Biomedical Informatics Shared Resource >>>> Herbert Irving Comprehensive Cancer Center (HICCC) >>>> Lecturer, >>>> Department of Biomedical Informatics (DBMI) >>>> Educational Coordinator, >>>> Center for Computational Biology and Bioinformatics (C2B2)/ >>>> National Center for Multiscale Analysis of Genomic Networks >>>> (MAGNet) >>>> Room 824 >>>> Irving Cancer Research Center >>>> Columbia University >>>> 1130 St. Nicholas Ave >>>> New York, NY 10032 >>>> (212)851-4765 (voice) >>>> friedman at cancercenter.columbia.edu >>>> http://cancercenter.columbia.edu/~friedman/ >>>> >>>> "School is an evil plot to suppress my individuality" >>>> >>>> Rose Friedman, age15 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >>> >> >> -- >> ------------------------------------------------------------ >> Richard A. Friedman, PhD >> Associate Research Scientist >> Herbert Irving Comprehensive Cancer Center >> Biomedical Informatics Shared Resource >> Lecturer >> Department of Biomedical Informatics >> Box 95, Room 130BB or P&S 1-420C >> Columbia University Medical Center >> 630 W. 168th St. >> New York, NY 10032 >> (212)305-6901 (5-6901) (voice) >> friedman at cancercenter.columbia.edu >> http://cancercenter.columbia.edu/~friedman/ >> >> "The last 250 pages of the last Harry Potter >> book took place in one day because alot >> happened in that day. All of Ulysses takes >> place in one day and nothing happened in that day." >> -Rose Friedman, age 11 >> >> > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}

ADD REPLY • link 11.9 years ago Richard Friedman ★ 2.0k

0

Entering edit mode

Hi Richard, It seems to me that this paper is highly relevant to the question you are trying to answer: Independent filtering increases detection power for high-throughput experiments http://www.pnas.org/content/107/21/9546.full Perhaps you can see where your "filtering scheme" lands in the landscape of filters described there. HTH, -steve On Thu, May 17, 2012 at 9:25 AM, Richard Friedman <friedman at="" cancercenter.columbia.edu=""> wrote: > Moshe, > > ? ? ? ?Thank you for the clarification on the moderated t-statistic. > If I am only interested in 10 genes is it better to calculate the moderated > statistic and hence raw p-values based on all of the genes on the array > or just thoe 10 genes? > > Best wishes, > Rich > > > On May 17, 2012, at 12:35 AM, Moshe Olshansky wrote: > >> Hi Rich, >> >> I think that Gordon Smyth (the author of limma) has explained at this list >> what moderated t-statistic is. >> The brief explanation is that when there are few samples the estimate of >> the variance which is used in a standard t-test is quite noisy and because >> one must account for this noise the standard t-test has a low statistical >> power. The Empirical Bayes model used in the moderated t-tests allows to >> estimate the variance with more confidence and therefore has a better >> power. So it can be used even if you are interested in just a few genes. >> It has (almost) nothing to do with the multiple testing adjustment. Well, >> one may ask whether moderated p-values satisfy the assumptions of multiple >> testing adjustment procedures (in particular the BH), but this is another >> story. May be Gordon will comment on this. >> >> Best regards, >> Moshe. >> >>> Moshe and List, >>> >>> ? ? ? ?Thanks for yoru reply. The method you describe retains >>> the raw p-value based on the moderated t-statistic and adjusts >>> it to give an adjusted p-value (usually a false discovery rate). >>> However, as I understand it, the moderated >>> t-statistic given by Limma based on >>> all of the genes in the array, pools variance information >>> to moderate the standard deviation to prevent fortuitously >>> low p-values stemming from fortuitously low standard deviations >>> encountered in thousands of multiple tests.I am wondering >>> that if the experimentalist asks me to look up just 10 genes >>> I should use the unmoderated frequentist t-statistic which >>> will differ from the one in Limma and may imply significance >>> where Limma does not. I guess another way to phrase it is >>> "How many simulataneous tests does one need before one >>> should prefer the moderated statistic to the empirical >>> Bayesian one". Or should I fit just those 10 genes >>> (~30 affy probes) with Limma? >>> >>> Best wishes, >>> Rich >>> >>> >>> >>> On Thu, 17 May 2012, Moshe Olshansky wrote: >>> >>>> Hi Rich, >>>> >>>> Whether to use the moderated t-statistic or not does not depend on >>>> whether >>>> you are interested in the 10 particular genes or in all differentially >>>> expressed ones. This will affect your multiple testing adjustment. >>>> The simplest way for you to proceed is to use limma as usual, get the >>>> topTable but then take the UNADJUSTED p-values for your 10 genes of >>>> interest and use the p.adjust function to adjust for multiple testing if >>>> you wish. In any case you should also look at (log)Fold Changes. >>>> >>>> Best regards, >>>> Moshe. >>>> >>>> >>>>> Dear Bioconductor ?List. >>>>> >>>>> ? ? ? ?I am using Limma to analyze differential expression between 2 >>>>> conditions on an Affy chip. >>>>> My experimental collaborator asks for the differential ?expression of >>>>> 10 predefined genes. >>>>> >>>>> A, Should I correct for false discoveries based upon all of the genes >>>>> on the chip? >>>>> B. If not, should I correct for false discoveries just for the >>>>> probeids for the 10 predefined >>>>> genes? >>>>> C. Should I use the moderated t-statistic or just use an unmoderated t- >>>>> test for those 10 >>>>> genes. >>>>> >>>>> Thanks and best wishes, >>>>> Rich >>>>> ------------------------------------------------------------ >>>>> Richard A. Friedman, PhD >>>>> Associate Research Scientist, >>>>> Biomedical Informatics Shared Resource >>>>> Herbert Irving Comprehensive Cancer Center (HICCC) >>>>> Lecturer, >>>>> Department of Biomedical Informatics (DBMI) >>>>> Educational Coordinator, >>>>> Center for Computational Biology and Bioinformatics (C2B2)/ >>>>> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >>>>> Room 824 >>>>> Irving Cancer Research Center >>>>> Columbia University >>>>> 1130 St. Nicholas Ave >>>>> New York, NY 10032 >>>>> (212)851-4765?(voice) >>>>> friedman at cancercenter.columbia.edu >>>>> http://cancercenter.columbia.edu/~friedman/ >>>>> >>>>> "School is an evil plot to suppress my individuality" >>>>> >>>>> Rose Friedman, age15 >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >>>> >>> >>> -- >>> ------------------------------------------------------------ >>> Richard A. Friedman, PhD >>> Associate Research Scientist >>> Herbert Irving Comprehensive Cancer Center >>> Biomedical Informatics Shared Resource >>> Lecturer >>> Department of Biomedical Informatics >>> Box 95, Room 130BB or P&S 1-420C >>> Columbia University Medical Center >>> 630 W. 168th St. >>> New York, NY 10032 >>> (212)305-6901 (5-6901) (voice) >>> friedman at cancercenter.columbia.edu >>> http://cancercenter.columbia.edu/~friedman/ >>> >>> "The last 250 pages of the last Harry Potter >>> book took place in one day because alot >>> happened in that day. All of Ulysses takes >>> place in one day and nothing happened in that day." >>> -Rose Friedman, age 11 >>> >>> >> >> >> >> ______________________________________________________________________ >> The information in this email is confidential and inte...{{dropped:6}} > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 11.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Steve, I have reread the paper and believe that the "genes of biological interest filter" about which I am asking, is qualitatively different than the numerical filters in the paper. I will follow Moshe and Kasper's advice and use the moderated t-statistic. Thanks and best wishes, Rich On May 17, 2012, at 11:54 AM, Steve Lianoglou wrote: > Hi Richard, > > It seems to me that this paper is highly relevant to the question you > are trying to answer: > > Independent filtering increases detection power for high-throughput > experiments > http://www.pnas.org/content/107/21/9546.full > > Perhaps you can see where your "filtering scheme" lands in the > landscape of filters described there. > > HTH, > -steve > > On Thu, May 17, 2012 at 9:25 AM, Richard Friedman > <friedman at="" cancercenter.columbia.edu=""> wrote: >> Moshe, >> >> Thank you for the clarification on the moderated t-statistic. >> If I am only interested in 10 genes is it better to calculate the >> moderated >> statistic and hence raw p-values based on all of the genes on the >> array >> or just thoe 10 genes? >> >> Best wishes, >> Rich >> >> >> On May 17, 2012, at 12:35 AM, Moshe Olshansky wrote: >> >>> Hi Rich, >>> >>> I think that Gordon Smyth (the author of limma) has explained at >>> this list >>> what moderated t-statistic is. >>> The brief explanation is that when there are few samples the >>> estimate of >>> the variance which is used in a standard t-test is quite noisy and >>> because >>> one must account for this noise the standard t-test has a low >>> statistical >>> power. The Empirical Bayes model used in the moderated t-tests >>> allows to >>> estimate the variance with more confidence and therefore has a >>> better >>> power. So it can be used even if you are interested in just a few >>> genes. >>> It has (almost) nothing to do with the multiple testing >>> adjustment. Well, >>> one may ask whether moderated p-values satisfy the assumptions of >>> multiple >>> testing adjustment procedures (in particular the BH), but this is >>> another >>> story. May be Gordon will comment on this. >>> >>> Best regards, >>> Moshe. >>> >>>> Moshe and List, >>>> >>>> Thanks for yoru reply. The method you describe retains >>>> the raw p-value based on the moderated t-statistic and adjusts >>>> it to give an adjusted p-value (usually a false discovery rate). >>>> However, as I understand it, the moderated >>>> t-statistic given by Limma based on >>>> all of the genes in the array, pools variance information >>>> to moderate the standard deviation to prevent fortuitously >>>> low p-values stemming from fortuitously low standard deviations >>>> encountered in thousands of multiple tests.I am wondering >>>> that if the experimentalist asks me to look up just 10 genes >>>> I should use the unmoderated frequentist t-statistic which >>>> will differ from the one in Limma and may imply significance >>>> where Limma does not. I guess another way to phrase it is >>>> "How many simulataneous tests does one need before one >>>> should prefer the moderated statistic to the empirical >>>> Bayesian one". Or should I fit just those 10 genes >>>> (~30 affy probes) with Limma? >>>> >>>> Best wishes, >>>> Rich >>>> >>>> >>>> >>>> On Thu, 17 May 2012, Moshe Olshansky wrote: >>>> >>>>> Hi Rich, >>>>> >>>>> Whether to use the moderated t-statistic or not does not depend on >>>>> whether >>>>> you are interested in the 10 particular genes or in all >>>>> differentially >>>>> expressed ones. This will affect your multiple testing adjustment. >>>>> The simplest way for you to proceed is to use limma as usual, >>>>> get the >>>>> topTable but then take the UNADJUSTED p-values for your 10 genes >>>>> of >>>>> interest and use the p.adjust function to adjust for multiple >>>>> testing if >>>>> you wish. In any case you should also look at (log)Fold Changes. >>>>> >>>>> Best regards, >>>>> Moshe. >>>>> >>>>> >>>>>> Dear Bioconductor List. >>>>>> >>>>>> I am using Limma to analyze differential expression >>>>>> between 2 >>>>>> conditions on an Affy chip. >>>>>> My experimental collaborator asks for the differential >>>>>> expression of >>>>>> 10 predefined genes. >>>>>> >>>>>> A, Should I correct for false discoveries based upon all of the >>>>>> genes >>>>>> on the chip? >>>>>> B. If not, should I correct for false discoveries just for the >>>>>> probeids for the 10 predefined >>>>>> genes? >>>>>> C. Should I use the moderated t-statistic or just use an >>>>>> unmoderated t- >>>>>> test for those 10 >>>>>> genes. >>>>>> >>>>>> Thanks and best wishes, >>>>>> Rich >>>>>> ------------------------------------------------------------ >>>>>> Richard A. Friedman, PhD >>>>>> Associate Research Scientist, >>>>>> Biomedical Informatics Shared Resource >>>>>> Herbert Irving Comprehensive Cancer Center (HICCC) >>>>>> Lecturer, >>>>>> Department of Biomedical Informatics (DBMI) >>>>>> Educational Coordinator, >>>>>> Center for Computational Biology and Bioinformatics (C2B2)/ >>>>>> National Center for Multiscale Analysis of Genomic Networks >>>>>> (MAGNet) >>>>>> Room 824 >>>>>> Irving Cancer Research Center >>>>>> Columbia University >>>>>> 1130 St. Nicholas Ave >>>>>> New York, NY 10032 >>>>>> (212)851-4765 (voice) >>>>>> friedman at cancercenter.columbia.edu >>>>>> http://cancercenter.columbia.edu/~friedman/ >>>>>> >>>>>> "School is an evil plot to suppress my individuality" >>>>>> >>>>>> Rose Friedman, age15 >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> ------------------------------------------------------------ >>>> Richard A. Friedman, PhD >>>> Associate Research Scientist >>>> Herbert Irving Comprehensive Cancer Center >>>> Biomedical Informatics Shared Resource >>>> Lecturer >>>> Department of Biomedical Informatics >>>> Box 95, Room 130BB or P&S 1-420C >>>> Columbia University Medical Center >>>> 630 W. 168th St. >>>> New York, NY 10032 >>>> (212)305-6901 (5-6901) (voice) >>>> friedman at cancercenter.columbia.edu >>>> http://cancercenter.columbia.edu/~friedman/ >>>> >>>> "The last 250 pages of the last Harry Potter >>>> book took place in one day because alot >>>> happened in that day. All of Ulysses takes >>>> place in one day and nothing happened in that day." >>>> -Rose Friedman, age 11 >>>> >>>> >>> >>> >>> >>> ______________________________________________________________________ >>> The information in this email is confidential and inte...{{dropped: >>> 6}} >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 11.9 years ago Richard Friedman ★ 2.0k

0

Entering edit mode

If the researcher really only care about those 10 genes, you should (always) use the moderated t-statistic, and correct for multiple testing using only 10 genes (which is more or less the same as not correcting). On Thu, May 17, 2012 at 9:25 AM, Richard Friedman <friedman at="" cancercenter.columbia.edu=""> wrote: > Moshe, > > ? ? ? ?Thank you for the clarification on the moderated t-statistic. > If I am only interested in 10 genes is it better to calculate the moderated > statistic and hence raw p-values based on all of the genes on the array > or just thoe 10 genes? > > Best wishes, > Rich > > > On May 17, 2012, at 12:35 AM, Moshe Olshansky wrote: > >> Hi Rich, >> >> I think that Gordon Smyth (the author of limma) has explained at this list >> what moderated t-statistic is. >> The brief explanation is that when there are few samples the estimate of >> the variance which is used in a standard t-test is quite noisy and because >> one must account for this noise the standard t-test has a low statistical >> power. The Empirical Bayes model used in the moderated t-tests allows to >> estimate the variance with more confidence and therefore has a better >> power. So it can be used even if you are interested in just a few genes. >> It has (almost) nothing to do with the multiple testing adjustment. Well, >> one may ask whether moderated p-values satisfy the assumptions of multiple >> testing adjustment procedures (in particular the BH), but this is another >> story. May be Gordon will comment on this. >> >> Best regards, >> Moshe. >> >>> Moshe and List, >>> >>> ? ? ? ?Thanks for yoru reply. The method you describe retains >>> the raw p-value based on the moderated t-statistic and adjusts >>> it to give an adjusted p-value (usually a false discovery rate). >>> However, as I understand it, the moderated >>> t-statistic given by Limma based on >>> all of the genes in the array, pools variance information >>> to moderate the standard deviation to prevent fortuitously >>> low p-values stemming from fortuitously low standard deviations >>> encountered in thousands of multiple tests.I am wondering >>> that if the experimentalist asks me to look up just 10 genes >>> I should use the unmoderated frequentist t-statistic which >>> will differ from the one in Limma and may imply significance >>> where Limma does not. I guess another way to phrase it is >>> "How many simulataneous tests does one need before one >>> should prefer the moderated statistic to the empirical >>> Bayesian one". Or should I fit just those 10 genes >>> (~30 affy probes) with Limma? >>> >>> Best wishes, >>> Rich >>> >>> >>> >>> On Thu, 17 May 2012, Moshe Olshansky wrote: >>> >>>> Hi Rich, >>>> >>>> Whether to use the moderated t-statistic or not does not depend on >>>> whether >>>> you are interested in the 10 particular genes or in all differentially >>>> expressed ones. This will affect your multiple testing adjustment. >>>> The simplest way for you to proceed is to use limma as usual, get the >>>> topTable but then take the UNADJUSTED p-values for your 10 genes of >>>> interest and use the p.adjust function to adjust for multiple testing if >>>> you wish. In any case you should also look at (log)Fold Changes. >>>> >>>> Best regards, >>>> Moshe. >>>> >>>> >>>>> Dear Bioconductor ?List. >>>>> >>>>> ? ? ? ?I am using Limma to analyze differential expression between 2 >>>>> conditions on an Affy chip. >>>>> My experimental collaborator asks for the differential ?expression of >>>>> 10 predefined genes. >>>>> >>>>> A, Should I correct for false discoveries based upon all of the genes >>>>> on the chip? >>>>> B. If not, should I correct for false discoveries just for the >>>>> probeids for the 10 predefined >>>>> genes? >>>>> C. Should I use the moderated t-statistic or just use an unmoderated t- >>>>> test for those 10 >>>>> genes. >>>>> >>>>> Thanks and best wishes, >>>>> Rich >>>>> ------------------------------------------------------------ >>>>> Richard A. Friedman, PhD >>>>> Associate Research Scientist, >>>>> Biomedical Informatics Shared Resource >>>>> Herbert Irving Comprehensive Cancer Center (HICCC) >>>>> Lecturer, >>>>> Department of Biomedical Informatics (DBMI) >>>>> Educational Coordinator, >>>>> Center for Computational Biology and Bioinformatics (C2B2)/ >>>>> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >>>>> Room 824 >>>>> Irving Cancer Research Center >>>>> Columbia University >>>>> 1130 St. Nicholas Ave >>>>> New York, NY 10032 >>>>> (212)851-4765?(voice) >>>>> friedman at cancercenter.columbia.edu >>>>> http://cancercenter.columbia.edu/~friedman/ >>>>> >>>>> "School is an evil plot to suppress my individuality" >>>>> >>>>> Rose Friedman, age15 >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >>>> >>> >>> -- >>> ------------------------------------------------------------ >>> Richard A. Friedman, PhD >>> Associate Research Scientist >>> Herbert Irving Comprehensive Cancer Center >>> Biomedical Informatics Shared Resource >>> Lecturer >>> Department of Biomedical Informatics >>> Box 95, Room 130BB or P&S 1-420C >>> Columbia University Medical Center >>> 630 W. 168th St. >>> New York, NY 10032 >>> (212)305-6901 (5-6901) (voice) >>> friedman at cancercenter.columbia.edu >>> http://cancercenter.columbia.edu/~friedman/ >>> >>> "The last 250 pages of the last Harry Potter >>> book took place in one day because alot >>> happened in that day. All of Ulysses takes >>> place in one day and nothing happened in that day." >>> -Rose Friedman, age 11 >>> >>> >> >> >> >> ______________________________________________________________________ >> The information in this email is confidential and inte...{{dropped:6}} > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.9 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Hi Rich, You have already got the answer from Kasper. This is exactly what I am suggesting. The idea is that after log transformation the variances of the genes follow some distribution. So the more genes you are using the better you can estimate that distribution. This is just a model, nobody is claiming that this is what really happens. But it seems to work pretty well in "real life". Moshe. > Moshe, > > Thank you for the clarification on the moderated t-statistic. > If I am only interested in 10 genes is it better to calculate the > moderated > statistic and hence raw p-values based on all of the genes on the array > or just thoe 10 genes? > > Best wishes, > Rich > > On May 17, 2012, at 12:35 AM, Moshe Olshansky wrote: > >> Hi Rich, >> >> I think that Gordon Smyth (the author of limma) has explained at >> this list >> what moderated t-statistic is. >> The brief explanation is that when there are few samples the >> estimate of >> the variance which is used in a standard t-test is quite noisy and >> because >> one must account for this noise the standard t-test has a low >> statistical >> power. The Empirical Bayes model used in the moderated t-tests >> allows to >> estimate the variance with more confidence and therefore has a better >> power. So it can be used even if you are interested in just a few >> genes. >> It has (almost) nothing to do with the multiple testing adjustment. >> Well, >> one may ask whether moderated p-values satisfy the assumptions of >> multiple >> testing adjustment procedures (in particular the BH), but this is >> another >> story. May be Gordon will comment on this. >> >> Best regards, >> Moshe. >> >>> Moshe and List, >>> >>> Thanks for yoru reply. The method you describe retains >>> the raw p-value based on the moderated t-statistic and adjusts >>> it to give an adjusted p-value (usually a false discovery rate). >>> However, as I understand it, the moderated >>> t-statistic given by Limma based on >>> all of the genes in the array, pools variance information >>> to moderate the standard deviation to prevent fortuitously >>> low p-values stemming from fortuitously low standard deviations >>> encountered in thousands of multiple tests.I am wondering >>> that if the experimentalist asks me to look up just 10 genes >>> I should use the unmoderated frequentist t-statistic which >>> will differ from the one in Limma and may imply significance >>> where Limma does not. I guess another way to phrase it is >>> "How many simulataneous tests does one need before one >>> should prefer the moderated statistic to the empirical >>> Bayesian one". Or should I fit just those 10 genes >>> (~30 affy probes) with Limma? >>> >>> Best wishes, >>> Rich >>> >>> >>> >>> On Thu, 17 May 2012, Moshe Olshansky wrote: >>> >>>> Hi Rich, >>>> >>>> Whether to use the moderated t-statistic or not does not depend on >>>> whether >>>> you are interested in the 10 particular genes or in all >>>> differentially >>>> expressed ones. This will affect your multiple testing adjustment. >>>> The simplest way for you to proceed is to use limma as usual, get >>>> the >>>> topTable but then take the UNADJUSTED p-values for your 10 genes of >>>> interest and use the p.adjust function to adjust for multiple >>>> testing if >>>> you wish. In any case you should also look at (log)Fold Changes. >>>> >>>> Best regards, >>>> Moshe. >>>> >>>> >>>>> Dear Bioconductor List. >>>>> >>>>> I am using Limma to analyze differential expression between 2 >>>>> conditions on an Affy chip. >>>>> My experimental collaborator asks for the differential >>>>> expression of >>>>> 10 predefined genes. >>>>> >>>>> A, Should I correct for false discoveries based upon all of the >>>>> genes >>>>> on the chip? >>>>> B. If not, should I correct for false discoveries just for the >>>>> probeids for the 10 predefined >>>>> genes? >>>>> C. Should I use the moderated t-statistic or just use an >>>>> unmoderated t- >>>>> test for those 10 >>>>> genes. >>>>> >>>>> Thanks and best wishes, >>>>> Rich >>>>> ------------------------------------------------------------ >>>>> Richard A. Friedman, PhD >>>>> Associate Research Scientist, >>>>> Biomedical Informatics Shared Resource >>>>> Herbert Irving Comprehensive Cancer Center (HICCC) >>>>> Lecturer, >>>>> Department of Biomedical Informatics (DBMI) >>>>> Educational Coordinator, >>>>> Center for Computational Biology and Bioinformatics (C2B2)/ >>>>> National Center for Multiscale Analysis of Genomic Networks >>>>> (MAGNet) >>>>> Room 824 >>>>> Irving Cancer Research Center >>>>> Columbia University >>>>> 1130 St. Nicholas Ave >>>>> New York, NY 10032 >>>>> (212)851-4765 (voice) >>>>> friedman at cancercenter.columbia.edu >>>>> http://cancercenter.columbia.edu/~friedman/ >>>>> >>>>> "School is an evil plot to suppress my individuality" >>>>> >>>>> Rose Friedman, age15 >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >>>> >>> >>> -- >>> ------------------------------------------------------------ >>> Richard A. Friedman, PhD >>> Associate Research Scientist >>> Herbert Irving Comprehensive Cancer Center >>> Biomedical Informatics Shared Resource >>> Lecturer >>> Department of Biomedical Informatics >>> Box 95, Room 130BB or P&S 1-420C >>> Columbia University Medical Center >>> 630 W. 168th St. >>> New York, NY 10032 >>> (212)305-6901 (5-6901) (voice) >>> friedman at cancercenter.columbia.edu >>> http://cancercenter.columbia.edu/~friedman/ >>> >>> "The last 250 pages of the last Harry Potter >>> book took place in one day because alot >>> happened in that day. All of Ulysses takes >>> place in one day and nothing happened in that day." >>> -Rose Friedman, age 11 >>> >>> >> >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely >> for the addressee. >> You must not disclose, forward, print or use it without the >> permission of the sender. >> ______________________________________________________________________ > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 11.9 years ago Moshe Olshansky ▴ 260

0

Entering edit mode

Dear Rich & Moshe, I am not sure if cotinuation of this discussion here is suitable for this ml or not, but I will comment. Although I do not know what Rich intends to mean by the term "differential expression", I am sure that 1) t-statistics and 2) log (Fold change) are not enough. try t.test(c(1,2,3),c(4,5,6)) and t. test(c(10,20,30),c(40,50,60)) They provide you the same t-statistics and fold change. However, these two can have different meanings biologically, if both c(1,2,3,4,5,6) and c(10,20,30,40,50,60) have already been normalized to some control signal. yours, tag. 2012/5/17 Moshe Olshansky <olshansky at="" wehi.edu.au="">: > Hi Rich, > > Whether to use the moderated t-statistic or not does not depend on whether > you are interested in the 10 particular genes or in all differentially > expressed ones. This will affect your multiple testing adjustment. > The simplest way for you to proceed is to use limma as usual, get the > topTable but then take the UNADJUSTED p-values for your 10 genes of > interest and use the p.adjust function to adjust for multiple testing if > you wish. In any case you should also look at (log)Fold Changes. > > Best regards, > Moshe. > > >> Dear Bioconductor List. >> >> I am using Limma to analyze differential expression between 2 >> conditions on an Affy chip. >> My experimental collaborator asks for the differential expression of >> 10 predefined genes. >> >> A, Should I correct for false discoveries based upon all of the genes >> on the chip? >> B. If not, should I correct for false discoveries just for the >> probeids for the 10 predefined >> genes? >> C. Should I use the moderated t-statistic or just use an unmoderated t- >> test for those 10 >> genes. >> >> Thanks and best wishes, >> Rich >> ------------------------------------------------------------ >> Richard A. Friedman, PhD >> Associate Research Scientist, >> Biomedical Informatics Shared Resource >> Herbert Irving Comprehensive Cancer Center (HICCC) >> Lecturer, >> Department of Biomedical Informatics (DBMI) >> Educational Coordinator, >> Center for Computational Biology and Bioinformatics (C2B2)/ >> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >> Room 824 >> Irving Cancer Research Center >> Columbia University >> 1130 St. Nicholas Ave >> New York, NY 10032 >> (212)851-4765 (voice) >> friedman at cancercenter.columbia.edu >> http://cancercenter.columbia.edu/~friedman/ >> >> "School is an evil plot to suppress my individuality" >> >> Rose Friedman, age15 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- > Moshe Olshansky > Division of Bioinformatics > The Walter & Eliza Hall Institute of Medical Research > 1G Royal Parade, Parkville, Vic 3052 > e-mail: olshansky at wehi.edu.au > tel: (03) 9345 2849 > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:15}}

ADD REPLY • link 11.9 years ago Y-h. Taguchi ▴ 10

Login before adding your answer.