Question: Testing for no change in RNA-seq data?
0
gravatar for Ryan C. Thompson
6.2 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.3k wrote:
Hi all, So, we have several great packages for testing for evidence of differential expression using RNA-seq counts, and I am happily using those. However, I would like to know if there is any method for testing for evidence of equivalent expression. That is, I would like to show with some level of confidence that treatment X has no (or negligible) effect on gene Y. Is there any way to conduct such a test? I know I can test for differential expression and see if the gene list comes up empty, but lack of evidence for differential expression is not the same as evidence for equivalent expression. So, are there any packages for doing equivalence testing on gene expression data? If it exists for limma, then I'm happy to run my data through voom to analyze it if necessary. Thanks, -Ryan Thompson
limma • 964 views
ADD COMMENTlink modified 6.2 years ago by Simon Anders3.5k • written 6.2 years ago by Ryan C. Thompson7.3k
Answer: Testing for no change in RNA-seq data?
0
gravatar for Albyn Jones
6.2 years ago by
Albyn Jones70
Albyn Jones70 wrote:
see "tests of equivalence", a common one being the TOST, aka two one-sided tests. you need to have a substantive definition of equivalence! albyn On 2013-03-20 17:05, Ryan C. Thompson wrote: > Hi all, > > So, we have several great packages for testing for evidence of > differential expression using RNA-seq counts, and I am happily using > those. However, I would like to know if there is any method for > testing for evidence of equivalent expression. That is, I would like > to show with some level of confidence that treatment X has no (or > negligible) effect on gene Y. Is there any way to conduct such a > test? > > I know I can test for differential expression and see if the gene > list comes up empty, but lack of evidence for differential expression > is not the same as evidence for equivalent expression. So, are there > any packages for doing equivalence testing on gene expression data? > If > it exists for limma, then I'm happy to run my data through voom to > analyze it if necessary. > > Thanks, > > -Ryan Thompson > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 6.2 years ago by Albyn Jones70
Are there equivalence tests that do not require the assumption of normality? Or should I transform with limma::voom and pretend it's normal? -Ryan On 03/20/2013 05:15 PM, Albyn Jones wrote: > see "tests of equivalence", a common one being the TOST, > aka two one-sided tests. you need to have a substantive > definition of equivalence! > > albyn > > On 2013-03-20 17:05, Ryan C. Thompson wrote: >> Hi all, >> >> So, we have several great packages for testing for evidence of >> differential expression using RNA-seq counts, and I am happily using >> those. However, I would like to know if there is any method for >> testing for evidence of equivalent expression. That is, I would like >> to show with some level of confidence that treatment X has no (or >> negligible) effect on gene Y. Is there any way to conduct such a test? >> >> I know I can test for differential expression and see if the gene >> list comes up empty, but lack of evidence for differential expression >> is not the same as evidence for equivalent expression. So, are there >> any packages for doing equivalence testing on gene expression data? If >> it exists for limma, then I'm happy to run my data through voom to >> analyze it if necessary. >> >> Thanks, >> >> -Ryan Thompson >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 6.2 years ago by Ryan C. Thompson7.3k
Answer: Testing for no change in RNA-seq data?
0
gravatar for Simon Anders
6.2 years ago by
Simon Anders3.5k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.5k wrote:
Hi Ryan On 21/03/13 01:05, Ryan C. Thompson wrote: > So, we have several great packages for testing for evidence of > differential expression using RNA-seq counts, and I am happily using > those. However, I would like to know if there is any method for testing > for evidence of equivalent expression. That is, I would like to show > with some level of confidence that treatment X has no (or negligible) > effect on gene Y. Is there any way to conduct such a test? One of the new features of DESeq2 is that it provides a empirical- Bayes style shrinkage estimates of coefficients (i.e., log fold changes) and also estimates standard errors for these coefficients (taken from the the reciprocal curvature of the posterior). Your task is one of the application we had in mind for this. You want to know whether the fold change is zero or nearly zero, and you also want to be ascertained that this estimate of a nearly-zero fold change is a precise one. So, to find genes whose absolute log fold change is _reliably_ smaller than some threshold, take all those genes for which the estimated abs log fold change is below the threshold and the standard error is well below the threshold, too. The reason, why I write "below a threshold" rather than "equal to zero" is that it's biologically unreasonable to assume that a treatment has exactly zero effect on a gene's expression. Gene regulation is such an interconnected network that, at least in my view, every gene will react ever so slightly to every perturbance, but often, this reaction is too small to be measurable. This is why you should define a threshold for "biologically unlikely to be relevant". Let's say, we don't believe that an expression change of less than 15% (0.2 on a log2 scale) is worth bothering. So, if you might take all genes with abs log2 fold change below 0.2, and furthermore require that the standard error of this log2 fold change estimate is below, say, 0.05, than you will get gene with true values well below at least 0.3 or so. In the end, the thresholds won't matter too much but if you want real p values, this is possible, too: Simply divide the log2 fold change estimates by their standard errors to get z scores (this works because the sampling distribution of our fold change estimates is reasonably close to normal), and do two one-sided tests (TOST) as suggested by Ryan above. Actually, as this may come up more often, we should maybe explain the details of using this approach in the vignette. Ask again if you want to see some code. Simon
ADD COMMENTlink written 6.2 years ago by Simon Anders3.5k
I am looking at my fit results from GLM of edgeR for my data: and assume that $coefficients are related to the expression levels. Just wondering how I can get the p values for those coefficients since I remember in limma, each coefficient will be accompanied by the corresponding t value and p value. Thanks a lot for the help. capricy =================== > design=model.matrix(~0+regroup) > y <- estimateGLMCommonDisp(y,design) > y <- estimateGLMTagwiseDisp(y,design) > fit <- glmFit(y,design) > fit An object of class "DGEGLM" $coefficients           regroupFR  regroupFA  regroupFM  regroupFP regroupFW regroupMA GS_14929  -9.543565  -9.267612  -9.310823  -9.167299 -10.62094 -9.260763 GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 -10.798889 GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 -11.654008 GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 -10.836647 GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 -9.722892           regroupMM  regroupMP  regroupMW GS_14929  -8.516256  -8.222225  -9.495394 GS_09776 -10.337434 -10.284703 -10.588544 GS_18434 -11.755905 -11.662276 -11.583950 GS_08334 -10.797529 -10.643409 -10.783104 GS_09550 -10.650495 -10.766462  -9.880359 15457 more rows ... $fitted.values                 MP        FA       FR        FW        MA      FW.1 MM GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178  82.62258 513.34590 GS_09776  75.81638  36.91137 39.53083 32.643715  36.33823  84.69746 83.04882 GS_18434  19.09812  16.54813 14.19827  4.825192  15.43876  12.51946 20.07922 GS_08334  52.95485  75.41272 24.33895 39.775089  34.99088 103.20054 52.41066 GS_09550  46.82022  84.90529 30.48442 41.218519 106.62251 106.94567 60.71721                 FP      MA.1        MW        FM       MM.1      MW.1 MP.1 GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 194.05252 568.23926 GS_09776 17.921685  26.51705  64.98434  32.30797  28.080239  65.01567 72.22056 GS_18434  5.064548  11.26610  23.99487  11.88420   6.789129  24.00644 18.19234 GS_08334 23.358730  25.53385  53.48880  36.89427  17.720948  53.51460 50.44331 GS_09550 30.811447  77.80550 131.97315  40.81919  20.529537 132.03679 44.59963               FM.1      FP.1      FA.1      FR.1 GS_14929 259.44483 324.86875 256.36216 198.47462 GS_09776  68.36129  74.42772  45.98959  92.72063 GS_18434  25.14609  21.03277  20.61808  33.30242 GS_08334  78.06556  97.00745  93.96021  57.08766 GS_09550  86.37041 127.95815 105.78744  71.50204 15457 more rows ... $counts           MP  FA  FR FW  MA FW.1  MM  FP MA.1  MW  FM MM.1 MW.1 MP.1 FM.1 FP.1 GS_14929 221 284 170 23 267  105 209 106   52 218 158  277  170  926 185  211 GS_09776  75  32  17 28  41   96  86  15   23  65  23   27   65   73 87   85 GS_18434  36  22  19  8  21    6  22   7    7  27  13    6   21    2 23   15 GS_08334  44  77  23 50  41   78  78  14   21  61  29    8   46   59 94  132 GS_09550  82  92  45 54 105   75  95  18   79 153  41    8  111   11 86  178          FA.1 FR.1 GS_14929  159    0 GS_09776   52  143 GS_18434   14   23 GS_08334   92   60 GS_09550   97   39 15457 more rows ... $deviance  GS_14929  GS_09776  GS_18434  GS_08334  GS_09550 19.059229  3.629342  9.509356  4.268291  8.809585 15457 more elements ... $df.residual [1] 9 9 9 9 9 15457 more elements ... $abundance [1]  -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 15457 more elements ... $design   regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM 1         0         0         0         0         0         0 0 2         0         1         0         0         0         0 0 3         1         0         0         0         0         0 0 4         0         0         0         0         1         0 0 5         0         0         0         0         0         1 0   regroupMP regroupMW 1         1         0 2         0         0 3         0         0 4         0         0 5         0         0 13 more rows ... $offset          [,1]     [,2]    [,3]     [,4]     [,5]     [,6]     [,7] [,8] [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703          [,9]    [,10]    [,11]    [,12]   [,13]    [,14]    [,15] [,16] [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085         [,17]    [,18] [1,] 14.81434 14.83441 [2,] 14.81434 14.83441 [3,] 14.81434 14.83441 [4,] 14.81434 14.83441 [5,] 14.81434 14.83441 15457 more rows ... $dispersion [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 15457 more elements ... $method [1] "oneway" $samples    group lib.size norm.factors MP    MP  2220863            1 FA    FA  2179157            1 FR    FR  1181036            1 FW    FW  1305802            1 MA    MA  1780507            1 13 more rows ... [[alternative HTML version deleted]]
ADD REPLYlink written 6.2 years ago by capricy gao230
Hi Capricy, Just like limma, edgeR has a very comprehensive user's guide that you can consult. library(edgeR) edgeRUsersGuide() Best, Jim On 3/21/2013 2:41 PM, capricy gao wrote: > I am looking at my fit results from GLM of edgeR for my data: > > and assume that $coefficients are related to the expression levels. Just wondering how I can get the p values for those coefficients since I remember in limma, each coefficient will be accompanied by the corresponding t value and p value. > > > Thanks a lot for the help. > > capricy > > =================== > >> design=model.matrix(~0+regroup) >> y<- estimateGLMCommonDisp(y,design) >> y<- estimateGLMTagwiseDisp(y,design) >> fit<- glmFit(y,design) >> fit > An object of class "DGEGLM" > $coefficients > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA > GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 -9.260763 > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 -10.798889 > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 -11.654008 > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 -10.836647 > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 -9.722892 > regroupMM regroupMP regroupMW > GS_14929 -8.516256 -8.222225 -9.495394 > GS_09776 -10.337434 -10.284703 -10.588544 > GS_18434 -11.755905 -11.662276 -11.583950 > GS_08334 -10.797529 -10.643409 -10.783104 > GS_09550 -10.650495 -10.766462 -9.880359 > 15457 more rows ... > > $fitted.values > MP FA FR FW MA FW.1 MM > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 513.34590 > GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 84.69746 83.04882 > GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 12.51946 20.07922 > GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 103.20054 52.41066 > GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 106.94567 60.71721 > FP MA.1 MW FM MM.1 MW.1 MP.1 > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 194.05252 568.23926 > GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 65.01567 72.22056 > GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 24.00644 18.19234 > GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 53.51460 50.44331 > GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 132.03679 44.59963 > FM.1 FP.1 FA.1 FR.1 > GS_14929 259.44483 324.86875 256.36216 198.47462 > GS_09776 68.36129 74.42772 45.98959 92.72063 > GS_18434 25.14609 21.03277 20.61808 33.30242 > GS_08334 78.06556 97.00745 93.96021 57.08766 > GS_09550 86.37041 127.95815 105.78744 71.50204 > 15457 more rows ... > > $counts > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 MP.1 FM.1 FP.1 > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 926 185 211 > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 87 85 > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 2 23 15 > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 94 132 > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 86 178 > FA.1 FR.1 > GS_14929 159 0 > GS_09776 52 143 > GS_18434 14 23 > GS_08334 92 60 > GS_09550 97 39 > 15457 more rows ... > > $deviance > GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 > 19.059229 3.629342 9.509356 4.268291 8.809585 > 15457 more elements ... > > $df.residual > [1] 9 9 9 9 9 > 15457 more elements ... > > $abundance > [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > 15457 more elements ... > > $design > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM > 1 0 0 0 0 0 0 0 > 2 0 1 0 0 0 0 0 > 3 1 0 0 0 0 0 0 > 4 0 0 0 0 1 0 0 > 5 0 0 0 0 0 1 0 > regroupMP regroupMW > 1 1 0 > 2 0 0 > 3 0 0 > 4 0 0 > 5 0 0 > 13 more rows ... > > $offset > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [,17] [,18] > [1,] 14.81434 14.83441 > [2,] 14.81434 14.83441 > [3,] 14.81434 14.83441 > [4,] 14.81434 14.83441 > [5,] 14.81434 14.83441 > 15457 more rows ... > > $dispersion > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > 15457 more elements ... > > $method > [1] "oneway" > > $samples > group lib.size norm.factors > MP MP 2220863 1 > FA FA 2179157 1 > FR FR 1181036 1 > FW FW 1305802 1 > MA MA 1780507 1 > 13 more rows ... > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLYlink written 6.2 years ago by James W. MacDonald50k
I think I have carefully read both the userguide and reference manual, but couldn't figure it out. Could anybody help me out what arguments should be included? Thanks a lot. In limma fit results: there is something like this: =============== $p.value             Grp1      Grp2vs1 Gene 1 0.8604469 6.019156e-05 Gene 2 0.2174605 1.673262e-05 Gene 3 0.3571957 5.758422e-02 Gene 4 0.6789641 1.758690e-01 Gene 5 0.4589329 8.223679e-01 95 more rows ... ============== But in edgeR I couldn't find it as I posted previously.... ________________________________ From: James W. MacDonald <jmacdon@uw.edu> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Thursday, March 21, 2013 1:45 PM Subject: Re: [BioC] EdgeR: coefficients for GLM for multiple groups Hi Capricy, Just like limma, edgeR has a very comprehensive user's guide that you can consult. library(edgeR) edgeRUsersGuide() Best, Jim On 3/21/2013 2:41 PM, capricy gao wrote: > I am looking at my fit results from GLM of edgeR for my data: > > and assume that $coefficients are related to the expression levels. Just wondering how I can get the p values for those coefficients since I remember in limma, each coefficient will be accompanied by the corresponding t value and p value. > > > Thanks a lot for the help. > > capricy > > =================== > >> design=model.matrix(~0+regroup) >> y<- estimateGLMCommonDisp(y,design) >> y<- estimateGLMTagwiseDisp(y,design) >> fit<- glmFit(y,design) >> fit > An object of class "DGEGLM" > $coefficients >            regroupFR  regroupFA  regroupFM  regroupFP regroupFW regroupMA > GS_14929  -9.543565  -9.267612  -9.310823  -9.167299 -10.62094 -9.260763 > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 -10.798889 > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 -11.654008 > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 -10.836647 > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 -9.722892 >            regroupMM  regroupMP  regroupMW > GS_14929  -8.516256  -8.222225  -9.495394 > GS_09776 -10.337434 -10.284703 -10.588544 > GS_18434 -11.755905 -11.662276 -11.583950 > GS_08334 -10.797529 -10.643409 -10.783104 > GS_09550 -10.650495 -10.766462  -9.880359 > 15457 more rows ... > > $fitted.values >                  MP        FA      FR        FW        MA      FW.1 MM > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178  82.62258 513.34590 > GS_09776  75.81638  36.91137 39.53083 32.643715  36.33823  84.69746 83.04882 > GS_18434  19.09812  16.54813 14.19827  4.825192  15.43876  12.51946 20.07922 > GS_08334  52.95485  75.41272 24.33895 39.775089  34.99088 103.20054 52.41066 > GS_09550  46.82022  84.90529 30.48442 41.218519 106.62251 106.94567 60.71721 >                  FP      MA.1        MW        FM      MM.1 MW.1      MP.1 > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 194.05252 568.23926 > GS_09776 17.921685  26.51705  64.98434  32.30797  28.080239 65.01567  72.22056 > GS_18434  5.064548  11.26610  23.99487  11.88420  6.789129 24.00644  18.19234 > GS_08334 23.358730  25.53385  53.48880  36.89427  17.720948 53.51460  50.44331 > GS_09550 30.811447  77.80550 131.97315  40.81919  20.529537 132.03679  44.59963 >                FM.1      FP.1      FA.1      FR.1 > GS_14929 259.44483 324.86875 256.36216 198.47462 > GS_09776  68.36129  74.42772  45.98959  92.72063 > GS_18434  25.14609  21.03277  20.61808  33.30242 > GS_08334  78.06556  97.00745  93.96021  57.08766 > GS_09550  86.37041 127.95815 105.78744  71.50204 > 15457 more rows ... > > $counts >            MP  FA  FR FW  MA FW.1  MM  FP MA.1  MW  FM MM.1 MW.1 MP.1 FM.1 FP.1 > GS_14929 221 284 170 23 267  105 209 106  52 218 158  277  170  926 185  211 > GS_09776  75  32  17 28  41  96  86  15  23  65  23  27  65  73 87  85 > GS_18434  36  22  19  8  21    6  22  7    7  27  13    6  21    2 23  15 > GS_08334  44  77  23 50  41  78  78  14  21  61  29    8  46  59 94  132 > GS_09550  82  92  45 54 105  75  95  18  79 153  41    8  111  11 86  178 >          FA.1 FR.1 > GS_14929  159    0 > GS_09776  52  143 > GS_18434  14  23 > GS_08334  92  60 > GS_09550  97  39 > 15457 more rows ... > > $deviance >  GS_14929  GS_09776  GS_18434  GS_08334  GS_09550 > 19.059229  3.629342  9.509356  4.268291  8.809585 > 15457 more elements ... > > $df.residual > [1] 9 9 9 9 9 > 15457 more elements ... > > $abundance > [1]  -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > 15457 more elements ... > > $design >    regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM > 1        0        0        0        0        0        0 0 > 2        0        1        0        0        0        0 0 > 3        1        0        0        0        0        0 0 > 4        0        0        0        0        1        0 0 > 5        0        0        0        0        0        1 0 >    regroupMP regroupMW > 1        1        0 > 2        0        0 > 3        0        0 > 4        0        0 > 5        0        0 > 13 more rows ... > > $offset >          [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7] [,8] > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 >          [,9]    [,10]    [,11]    [,12]  [,13]    [,14]    [,15] [,16] > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 >          [,17]    [,18] > [1,] 14.81434 14.83441 > [2,] 14.81434 14.83441 > [3,] 14.81434 14.83441 > [4,] 14.81434 14.83441 > [5,] 14.81434 14.83441 > 15457 more rows ... > > $dispersion > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > 15457 more elements ... > > $method > [1] "oneway" > > $samples >    group lib.size norm.factors > MP    MP  2220863            1 > FA    FA  2179157            1 > FR    FR  1181036            1 > FW    FW  1305802            1 > MA    MA  1780507            1 > 13 more rows ... >     [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD REPLYlink written 6.2 years ago by capricy gao230
Hi Capricy, I'm not convinced you have read the user's guide carefully. There is, for example, a section called 'Quick start' that is only seven pages in, that shows the canonical glm analysis steps. Then there are two more worked examples starting on page 52 that show pretty much all the steps required to do a glm analysis. However, in the code you show below you haven't followed the examples, nor have you used the accessor function that is intended to produce useful output. Best, Jim On 3/21/2013 3:57 PM, capricy gao wrote: > I think I have carefully read both the userguide and reference manual, > but couldn't figure it out. Could anybody help me out what arguments > should be included? Thanks a lot. > > In limma fit results: there is something like this: > =============== > $p.value > Grp1 Grp2vs1 > Gene 1 0.8604469 6.019156e-05 > Gene 2 0.2174605 1.673262e-05 > Gene 3 0.3571957 5.758422e-02 > Gene 4 0.6789641 1.758690e-01 > Gene 5 0.4589329 8.223679e-01 > 95 more rows ... > ============== > > But in edgeR I couldn't find it as I posted previously.... > > > -------------------------------------------------------------------- ---- > *From:* James W. MacDonald <jmacdon at="" uw.edu=""> > *To:* capricy gao <capricyg at="" yahoo.com=""> > *Cc:* "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > *Sent:* Thursday, March 21, 2013 1:45 PM > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > Hi Capricy, > > Just like limma, edgeR has a very comprehensive user's guide that you > can consult. > > library(edgeR) > edgeRUsersGuide() > > Best, > > Jim > > > > On 3/21/2013 2:41 PM, capricy gao wrote: > > I am looking at my fit results from GLM of edgeR for my data: > > > > and assume that $coefficients are related to the expression levels. > Just wondering how I can get the p values for those coefficients since > I remember in limma, each coefficient will be accompanied by the > corresponding t value and p value. > > > > > > Thanks a lot for the help. > > > > capricy > > > > =================== > > > >> design=model.matrix(~0+regroup) > >> y<- estimateGLMCommonDisp(y,design) > >> y<- estimateGLMTagwiseDisp(y,design) > >> fit<- glmFit(y,design) > >> fit > > An object of class "DGEGLM" > > $coefficients > > regroupFR regroupFA regroupFM regroupFP regroupFW > regroupMA > > GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 > -9.260763 > > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 > -10.798889 > > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 > -11.654008 > > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 > -10.836647 > > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 > -9.722892 > > regroupMM regroupMP regroupMW > > GS_14929 -8.516256 -8.222225 -9.495394 > > GS_09776 -10.337434 -10.284703 -10.588544 > > GS_18434 -11.755905 -11.662276 -11.583950 > > GS_08334 -10.797529 -10.643409 -10.783104 > > GS_09550 -10.650495 -10.766462 -9.880359 > > 15457 more rows ... > > > > $fitted.values > > MP FA FR FW MA FW.1 > MM > > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 > 513.34590 > > GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 84.69746 > 83.04882 > > GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 12.51946 > 20.07922 > > GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 103.20054 > 52.41066 > > GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 106.94567 > 60.71721 > > FP MA.1 MW FM MM.1 > MW.1 MP.1 > > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 > 194.05252 568.23926 > > GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 > 65.01567 72.22056 > > GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 > 24.00644 18.19234 > > GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 > 53.51460 50.44331 > > GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 > 132.03679 44.59963 > > FM.1 FP.1 FA.1 FR.1 > > GS_14929 259.44483 324.86875 256.36216 198.47462 > > GS_09776 68.36129 74.42772 45.98959 92.72063 > > GS_18434 25.14609 21.03277 20.61808 33.30242 > > GS_08334 78.06556 97.00745 93.96021 57.08766 > > GS_09550 86.37041 127.95815 105.78744 71.50204 > > 15457 more rows ... > > > > $counts > > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 > MP.1 FM.1 FP.1 > > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 926 > 185 211 > > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 87 85 > > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 2 > 23 15 > > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 > 94 132 > > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 > 86 178 > > FA.1 FR.1 > > GS_14929 159 0 > > GS_09776 52 143 > > GS_18434 14 23 > > GS_08334 92 60 > > GS_09550 97 39 > > 15457 more rows ... > > > > $deviance > > GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 > > 19.059229 3.629342 9.509356 4.268291 8.809585 > > 15457 more elements ... > > > > $df.residual > > [1] 9 9 9 9 9 > > 15457 more elements ... > > > > $abundance > > [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > > 15457 more elements ... > > > > $design > > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM > > 1 0 0 0 0 0 0 0 > > 2 0 1 0 0 0 0 0 > > 3 1 0 0 0 0 0 0 > > 4 0 0 0 0 1 0 0 > > 5 0 0 0 0 0 1 0 > > regroupMP regroupMW > > 1 1 0 > > 2 0 0 > > 3 0 0 > > 4 0 0 > > 5 0 0 > > 13 more rows ... > > > > $offset > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] > > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > [,9] [,10] [,11] [,12] [,13] [,14] [,15] > [,16] > > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > [,17] [,18] > > [1,] 14.81434 14.83441 > > [2,] 14.81434 14.83441 > > [3,] 14.81434 14.83441 > > [4,] 14.81434 14.83441 > > [5,] 14.81434 14.83441 > > 15457 more rows ... > > > > $dispersion > > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > > 15457 more elements ... > > > > $method > > [1] "oneway" > > > > $samples > > group lib.size norm.factors > > MP MP 2220863 1 > > FA FA 2179157 1 > > FR FR 1181036 1 > > FW FW 1305802 1 > > MA MA 1780507 1 > > 13 more rows ... > > [[alternative HTML version deleted]] > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLYlink written 6.2 years ago by James W. MacDonald50k
I wonder if there are different versions of userguide/reference manual. I still couldn't find what I could do to get useful output, like the pvalue for all the coefficients. The only pvalue I could see is in lrt output... ________________________________ From: James W. MacDonald <jmacdon@uw.edu> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Thursday, March 21, 2013 3:27 PM Subject: Re: [BioC] EdgeR: coefficients for GLM for multiple groups Hi Capricy, I'm not convinced you have read the user's guide carefully. There is, for example, a section called 'Quick start' that is only seven pages in, that shows the canonical glm analysis steps. Then there are two more worked examples starting on page 52 that show pretty much all the steps required to do a glm analysis. However, in the code you show below you haven't followed the examples, nor have you used the accessor function that is intended to produce useful output. Best, Jim On 3/21/2013 3:57 PM, capricy gao wrote: > I think I have carefully read both the userguide and reference manual, but couldn't figure it out. Could anybody help me out what arguments should be included? Thanks a lot. > > In limma fit results: there is something like this: > =============== > $p.value >            Grp1      Grp2vs1 > Gene 1 0.8604469 6.019156e-05 > Gene 2 0.2174605 1.673262e-05 > Gene 3 0.3571957 5.758422e-02 > Gene 4 0.6789641 1.758690e-01 > Gene 5 0.4589329 8.223679e-01 > 95 more rows ... > ============== > > But in edgeR I couldn't find it as I posted previously.... > > > -------------------------------------------------------------------- ---- > *From:* James W. MacDonald <jmacdon@uw.edu> > *Cc:* "bioconductor@r-project.org" <bioconductor@r-project.org> > *Sent:* Thursday, March 21, 2013 1:45 PM > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > Hi Capricy, > > Just like limma, edgeR has a very comprehensive user's guide that you > can consult. > > library(edgeR) > edgeRUsersGuide() > > Best, > > Jim > > > > On 3/21/2013 2:41 PM, capricy gao wrote: > > I am looking at my fit results from GLM of edgeR for my data: > > > > and assume that $coefficients are related to the expression levels. Just wondering how I can get the p values for those coefficients since I remember in limma, each coefficient will be accompanied by the corresponding t value and p value. > > > > > > Thanks a lot for the help. > > > > capricy > > > > =================== > > > >> design=model.matrix(~0+regroup) > >> y<- estimateGLMCommonDisp(y,design) > >> y<- estimateGLMTagwiseDisp(y,design) > >> fit<- glmFit(y,design) > >> fit > > An object of class "DGEGLM" > > $coefficients > >            regroupFR  regroupFA  regroupFM  regroupFP regroupFW regroupMA > > GS_14929  -9.543565  -9.267612  -9.310823  -9.167299 -10.62094 -9.260763 > > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 -10.798889 > > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 -11.654008 > > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 -10.836647 > > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 -9.722892 > >            regroupMM  regroupMP  regroupMW > > GS_14929  -8.516256  -8.222225  -9.495394 > > GS_09776 -10.337434 -10.284703 -10.588544 > > GS_18434 -11.755905 -11.662276 -11.583950 > > GS_08334 -10.797529 -10.643409 -10.783104 > > GS_09550 -10.650495 -10.766462  -9.880359 > > 15457 more rows ... > > > > $fitted.values > >                  MP        FA      FR        FW        MA FW.1        MM > > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 513.34590 > > GS_09776  75.81638  36.91137 39.53083 32.643715  36.33823 84.69746  83.04882 > > GS_18434  19.09812  16.54813 14.19827  4.825192  15.43876 12.51946  20.07922 > > GS_08334  52.95485  75.41272 24.33895 39.775089  34.99088 103.20054  52.41066 > > GS_09550  46.82022  84.90529 30.48442 41.218519 106.62251 106.94567  60.71721 > >                  FP      MA.1        MW        FM      MM.1 MW.1      MP.1 > > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 194.05252 568.23926 > > GS_09776 17.921685  26.51705  64.98434  32.30797  28.080239 65.01567  72.22056 > > GS_18434  5.064548  11.26610  23.99487  11.88420  6.789129 24.00644  18.19234 > > GS_08334 23.358730  25.53385  53.48880  36.89427  17.720948 53.51460  50.44331 > > GS_09550 30.811447  77.80550 131.97315  40.81919  20.529537 132.03679  44.59963 > >                FM.1      FP.1      FA.1      FR.1 > > GS_14929 259.44483 324.86875 256.36216 198.47462 > > GS_09776  68.36129  74.42772  45.98959  92.72063 > > GS_18434  25.14609  21.03277  20.61808  33.30242 > > GS_08334  78.06556  97.00745  93.96021  57.08766 > > GS_09550  86.37041 127.95815 105.78744  71.50204 > > 15457 more rows ... > > > > $counts > >            MP  FA  FR FW  MA FW.1  MM  FP MA.1  MW  FM MM.1 MW.1 MP.1 FM.1 FP.1 > > GS_14929 221 284 170 23 267  105 209 106  52 218 158  277  170 926  185  211 > > GS_09776  75  32  17 28  41  96  86  15  23  65  23  27  65  73 87  85 > > GS_18434  36  22  19  8  21    6  22  7    7  27  13    6  21    2 23  15 > > GS_08334  44  77  23 50  41  78  78  14  21  61  29    8  46  59 94  132 > > GS_09550  82  92  45 54 105  75  95  18  79 153  41    8  111  11 86  178 > >          FA.1 FR.1 > > GS_14929  159    0 > > GS_09776  52  143 > > GS_18434  14  23 > > GS_08334  92  60 > > GS_09550  97  39 > > 15457 more rows ... > > > > $deviance > >  GS_14929  GS_09776  GS_18434  GS_08334  GS_09550 > > 19.059229  3.629342  9.509356  4.268291  8.809585 > > 15457 more elements ... > > > > $df.residual > > [1] 9 9 9 9 9 > > 15457 more elements ... > > > > $abundance > > [1]  -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > > 15457 more elements ... > > > > $design > >    regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM > > 1        0        0        0        0        0        0        0 > > 2        0        1        0        0        0        0        0 > > 3        1        0        0        0        0        0        0 > > 4        0        0        0        0        1        0        0 > > 5        0        0        0        0        0        1        0 > >    regroupMP regroupMW > > 1        1        0 > > 2        0        0 > > 3        0        0 > > 4        0        0 > > 5        0        0 > > 13 more rows ... > > > > $offset > >          [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7] [,8] > > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > >          [,9]    [,10]    [,11]    [,12]  [,13]    [,14]    [,15] [,16] > > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > >          [,17]    [,18] > > [1,] 14.81434 14.83441 > > [2,] 14.81434 14.83441 > > [3,] 14.81434 14.83441 > > [4,] 14.81434 14.83441 > > [5,] 14.81434 14.83441 > > 15457 more rows ... > > > > $dispersion > > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > > 15457 more elements ... > > > > $method > > [1] "oneway" > > > > $samples > >    group lib.size norm.factors > > MP    MP  2220863            1 > > FA    FA  2179157            1 > > FR    FR  1181036            1 > > FW    FW  1305802            1 > > MA    MA  1780507            1 > > 13 more rows ... > >    [[alternative HTML version deleted]] > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD REPLYlink written 6.2 years ago by capricy gao230
I think you are fundamentally misunderstanding what you are doing. In the context of a glm, the way you test for the significance of a coefficient is by doing a likelihood ratio test, comparing a model that contains the coefficient to a reduced model that does not. So there won't be any p-values without making a comparison. In other words, the glmFit step sets up the model, and the glmLRT step does the test for whatever coefficient you care about. By default it tests all coefficients. From ?glmLRT: glmLRT conducts likelihood ratio tests for one or more coefficients in the linear model. If coef is used, the null hypothesis is that all the coefficients indicated by coef are equal to zero. If contrast is non-null, then the null hypothesis is that the specified contrast of the coefficients is equal to zero. For example, a contrast of c(0,1,-1), assuming there are three coefficients, would test the hypothesis that the second and third coefficients are equal. Best, Jim On 3/21/2013 4:55 PM, capricy gao wrote: > I wonder if there are different versions of userguide/reference manual. > > I still couldn't find what I could do to get useful output, like the > pvalue for all the coefficients. > > The only pvalue I could see is in lrt output... > > > -------------------------------------------------------------------- ---- > *From:* James W. MacDonald <jmacdon at="" uw.edu=""> > *To:* capricy gao <capricyg at="" yahoo.com=""> > *Cc:* "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > *Sent:* Thursday, March 21, 2013 3:27 PM > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > Hi Capricy, > > I'm not convinced you have read the user's guide carefully. There is, > for example, a section called 'Quick start' that is only seven pages > in, that shows the canonical glm analysis steps. Then there are two > more worked examples starting on page 52 that show pretty much all the > steps required to do a glm analysis. > > However, in the code you show below you haven't followed the examples, > nor have you used the accessor function that is intended to produce > useful output. > > Best, > > Jim > > > > On 3/21/2013 3:57 PM, capricy gao wrote: > > I think I have carefully read both the userguide and reference > manual, but couldn't figure it out. Could anybody help me out what > arguments should be included? Thanks a lot. > > > > In limma fit results: there is something like this: > > =============== > > $p.value > > Grp1 Grp2vs1 > > Gene 1 0.8604469 6.019156e-05 > > Gene 2 0.2174605 1.673262e-05 > > Gene 3 0.3571957 5.758422e-02 > > Gene 4 0.6789641 1.758690e-01 > > Gene 5 0.4589329 8.223679e-01 > > 95 more rows ... > > ============== > > > > But in edgeR I couldn't find it as I posted previously.... > > > > > > ------------------------------------------------------------------ ------ > > *From:* James W. MacDonald <jmacdon at="" uw.edu="" <mailto:jmacdon="" at="" uw.edu="">> > > *To:* capricy gao <capricyg at="" yahoo.com="" <mailto:capricyg="" at="" yahoo.com="">> > > *Cc:* "bioconductor at r-project.org > <mailto:bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">> > > *Sent:* Thursday, March 21, 2013 1:45 PM > > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > > > Hi Capricy, > > > > Just like limma, edgeR has a very comprehensive user's guide that you > > can consult. > > > > library(edgeR) > > edgeRUsersGuide() > > > > Best, > > > > Jim > > > > > > > > On 3/21/2013 2:41 PM, capricy gao wrote: > > > I am looking at my fit results from GLM of edgeR for my data: > > > > > > and assume that $coefficients are related to the expression > levels. Just wondering how I can get the p values for those > coefficients since I remember in limma, each coefficient will be > accompanied by the corresponding t value and p value. > > > > > > > > > Thanks a lot for the help. > > > > > > capricy > > > > > > =================== > > > > > >> design=model.matrix(~0+regroup) > > >> y<- estimateGLMCommonDisp(y,design) > > >> y<- estimateGLMTagwiseDisp(y,design) > > >> fit<- glmFit(y,design) > > >> fit > > > An object of class "DGEGLM" > > > $coefficients > > > regroupFR regroupFA regroupFM regroupFP regroupFW > regroupMA > > > GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 > -9.260763 > > > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 > -10.798889 > > > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 > -11.654008 > > > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 > -10.836647 > > > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 > -9.722892 > > > regroupMM regroupMP regroupMW > > > GS_14929 -8.516256 -8.222225 -9.495394 > > > GS_09776 -10.337434 -10.284703 -10.588544 > > > GS_18434 -11.755905 -11.662276 -11.583950 > > > GS_08334 -10.797529 -10.643409 -10.783104 > > > GS_09550 -10.650495 -10.766462 -9.880359 > > > 15457 more rows ... > > > > > > $fitted.values > > > MP FA FR FW MA > FW.1 MM > > > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 > 82.62258 513.34590 > > > GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 > 84.69746 83.04882 > > > GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 > 12.51946 20.07922 > > > GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 > 103.20054 52.41066 > > > GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 > 106.94567 60.71721 > > > FP MA.1 MW FM MM.1 > MW.1 MP.1 > > > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 > 194.05252 568.23926 > > > GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 > 65.01567 72.22056 > > > GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 > 24.00644 18.19234 > > > GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 > 53.51460 50.44331 > > > GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 > 132.03679 44.59963 > > > FM.1 FP.1 FA.1 FR.1 > > > GS_14929 259.44483 324.86875 256.36216 198.47462 > > > GS_09776 68.36129 74.42772 45.98959 92.72063 > > > GS_18434 25.14609 21.03277 20.61808 33.30242 > > > GS_08334 78.06556 97.00745 93.96021 57.08766 > > > GS_09550 86.37041 127.95815 105.78744 71.50204 > > > 15457 more rows ... > > > > > > $counts > > > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 > MP.1 FM.1 FP.1 > > > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 > 926 185 211 > > > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 > 87 85 > > > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 > 2 23 15 > > > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 > 94 132 > > > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 > 86 178 > > > FA.1 FR.1 > > > GS_14929 159 0 > > > GS_09776 52 143 > > > GS_18434 14 23 > > > GS_08334 92 60 > > > GS_09550 97 39 > > > 15457 more rows ... > > > > > > $deviance > > > GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 > > > 19.059229 3.629342 9.509356 4.268291 8.809585 > > > 15457 more elements ... > > > > > > $df.residual > > > [1] 9 9 9 9 9 > > > 15457 more elements ... > > > > > > $abundance > > > [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > > > 15457 more elements ... > > > > > > $design > > > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA > regroupMM > > > 1 0 0 0 0 0 0 0 > > > 2 0 1 0 0 0 0 0 > > > 3 1 0 0 0 0 0 0 > > > 4 0 0 0 0 1 0 0 > > > 5 0 0 0 0 0 1 0 > > > regroupMP regroupMW > > > 1 1 0 > > > 2 0 0 > > > 3 0 0 > > > 4 0 0 > > > 5 0 0 > > > 13 more rows ... > > > > > > $offset > > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] > > > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > [,9] [,10] [,11] [,12] [,13] [,14] [,15] > [,16] > > > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > [,17] [,18] > > > [1,] 14.81434 14.83441 > > > [2,] 14.81434 14.83441 > > > [3,] 14.81434 14.83441 > > > [4,] 14.81434 14.83441 > > > [5,] 14.81434 14.83441 > > > 15457 more rows ... > > > > > > $dispersion > > > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > > > 15457 more elements ... > > > > > > $method > > > [1] "oneway" > > > > > > $samples > > > group lib.size norm.factors > > > MP MP 2220863 1 > > > FA FA 2179157 1 > > > FR FR 1181036 1 > > > FW FW 1305802 1 > > > MA MA 1780507 1 > > > 13 more rows ... > > > [[alternative HTML version deleted]] > > > > > > > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > <mailto:bioconductor at="" r-project.org="" <mailto:bioconductor="" at="" r-project.org="">> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > > > > > > -- James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLYlink written 6.2 years ago by James W. MacDonald50k
I am actually trying to get more information about the results from edgeR. I understand that glmLRT is doing the test, and so there is a pvalue for it. My question is if glmFit is a fitting step like other GLM fitting in R, do these fitted coefficients also have a pvalue? My expectation is that they should and I used limma example since that limFit has the corresponding information... Maybe I am wrong... ________________________________ From: James W. MacDonald <jmacdon@uw.edu> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Thursday, March 21, 2013 4:17 PM Subject: Re: [BioC] EdgeR: coefficients for GLM for multiple groups I think you are fundamentally misunderstanding what you are doing. In the context of a glm, the way you test for the significance of a coefficient is by doing a likelihood ratio test, comparing a model that contains the coefficient to a reduced model that does not. So there won't be any p-values without making a comparison. In other words, the glmFit step sets up the model, and the glmLRT step does the test for whatever coefficient you care about. By default it tests all coefficients. From ?glmLRT:     glmLRT conducts likelihood ratio tests for one or more     coefficients in the linear model. If coef is used, the null     hypothesis is that all the coefficients indicated by coef are     equal to zero. If contrast is non-null, then the null hypothesis     is that the specified contrast of the coefficients is equal to     zero. For example, a contrast of c(0,1,-1), assuming there are     three coefficients, would test the hypothesis that the second and     third coefficients are equal. Best, Jim On 3/21/2013 4:55 PM, capricy gao wrote: > I wonder if there are different versions of userguide/reference manual. > > I still couldn't find what I could do to get useful output, like the pvalue for all the coefficients. > > The only pvalue I could see is in lrt output... > > > -------------------------------------------------------------------- ---- > *From:* James W. MacDonald <jmacdon@uw.edu> > *Cc:* "bioconductor@r-project.org" <bioconductor@r-project.org> > *Sent:* Thursday, March 21, 2013 3:27 PM > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > Hi Capricy, > > I'm not convinced you have read the user's guide carefully. There is, for example, a section called 'Quick start' that is only seven pages in, that shows the canonical glm analysis steps. Then there are two more worked examples starting on page 52 that show pretty much all the steps required to do a glm analysis. > > However, in the code you show below you haven't followed the examples, nor have you used the accessor function that is intended to produce useful output. > > Best, > > Jim > > > > On 3/21/2013 3:57 PM, capricy gao wrote: > > I think I have carefully read both the userguide and reference manual, but couldn't figure it out. Could anybody help me out what arguments should be included? Thanks a lot. > > > > In limma fit results: there is something like this: > > =============== > > $p.value > >            Grp1      Grp2vs1 > > Gene 1 0.8604469 6.019156e-05 > > Gene 2 0.2174605 1.673262e-05 > > Gene 3 0.3571957 5.758422e-02 > > Gene 4 0.6789641 1.758690e-01 > > Gene 5 0.4589329 8.223679e-01 > > 95 more rows ... > > ============== > > > > But in edgeR I couldn't find it as I posted previously.... > > > > > > ------------------------------------------------------------------ ------ > > *From:* James W. MacDonald <jmacdon@uw.edu <mailto:jmacdon@uw.edu="">> > > *Cc:* "bioconductor@r-project.org <mailto:bioconductor@r-project.org>" <bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> > > *Sent:* Thursday, March 21, 2013 1:45 PM > > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > > > Hi Capricy, > > > > Just like limma, edgeR has a very comprehensive user's guide that you > > can consult. > > > > library(edgeR) > > edgeRUsersGuide() > > > > Best, > > > > Jim > > > > > > > > On 3/21/2013 2:41 PM, capricy gao wrote: > > > I am looking at my fit results from GLM of edgeR for my data: > > > > > > and assume that $coefficients are related to the expression levels. Just wondering how I can get the p values for those coefficients since I remember in limma, each coefficient will be accompanied by the corresponding t value and p value. > > > > > > > > > Thanks a lot for the help. > > > > > > capricy > > > > > > =================== > > > > > >> design=model.matrix(~0+regroup) > > >> y<- estimateGLMCommonDisp(y,design) > > >> y<- estimateGLMTagwiseDisp(y,design) > > >> fit<- glmFit(y,design) > > >> fit > > > An object of class "DGEGLM" > > > $coefficients > > >            regroupFR  regroupFA  regroupFM  regroupFP regroupFW regroupMA > > > GS_14929  -9.543565  -9.267612  -9.310823  -9.167299 -10.62094 -9.260763 > > > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 -10.798889 > > > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 -11.654008 > > > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 -10.836647 > > > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 -9.722892 > > >            regroupMM  regroupMP  regroupMW > > > GS_14929  -8.516256  -8.222225  -9.495394 > > > GS_09776 -10.337434 -10.284703 -10.588544 > > > GS_18434 -11.755905 -11.662276 -11.583950 > > > GS_08334 -10.797529 -10.643409 -10.783104 > > > GS_09550 -10.650495 -10.766462  -9.880359 > > > 15457 more rows ... > > > > > > $fitted.values > > >                  MP        FA      FR        FW        MA FW.1        MM > > > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 513.34590 > > > GS_09776  75.81638  36.91137 39.53083 32.643715  36.33823 84.69746  83.04882 > > > GS_18434  19.09812  16.54813 14.19827  4.825192  15.43876 12.51946  20.07922 > > > GS_08334  52.95485  75.41272 24.33895 39.775089  34.99088 103.20054  52.41066 > > > GS_09550  46.82022  84.90529 30.48442 41.218519 106.62251 106.94567  60.71721 > > >                  FP      MA.1        MW        FM      MM.1 MW.1      MP.1 > > > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 194.05252 568.23926 > > > GS_09776 17.921685  26.51705  64.98434  32.30797  28.080239 65.01567  72.22056 > > > GS_18434  5.064548  11.26610  23.99487  11.88420  6.789129 24.00644  18.19234 > > > GS_08334 23.358730  25.53385  53.48880  36.89427  17.720948 53.51460  50.44331 > > > GS_09550 30.811447  77.80550 131.97315  40.81919  20.529537 132.03679  44.59963 > > >                FM.1      FP.1      FA.1      FR.1 > > > GS_14929 259.44483 324.86875 256.36216 198.47462 > > > GS_09776  68.36129  74.42772  45.98959  92.72063 > > > GS_18434  25.14609  21.03277  20.61808  33.30242 > > > GS_08334  78.06556  97.00745  93.96021  57.08766 > > > GS_09550  86.37041 127.95815 105.78744  71.50204 > > > 15457 more rows ... > > > > > > $counts > > >            MP  FA  FR FW  MA FW.1  MM  FP MA.1  MW  FM MM.1 MW.1 MP.1 FM.1 FP.1 > > > GS_14929 221 284 170 23 267  105 209 106  52 218 158  277  170 926  185  211 > > > GS_09776  75  32  17 28  41  96  86  15  23  65  23  27  65  73 87  85 > > > GS_18434  36  22  19  8  21    6  22  7    7  27  13    6  21 2  23  15 > > > GS_08334  44  77  23 50  41  78  78  14  21  61  29    8  46  59 94  132 > > > GS_09550  82  92  45 54 105  75  95  18  79 153  41    8  111 11  86  178 > > >          FA.1 FR.1 > > > GS_14929  159    0 > > > GS_09776  52  143 > > > GS_18434  14  23 > > > GS_08334  92  60 > > > GS_09550  97  39 > > > 15457 more rows ... > > > > > > $deviance > > >  GS_14929  GS_09776  GS_18434  GS_08334  GS_09550 > > > 19.059229  3.629342  9.509356  4.268291  8.809585 > > > 15457 more elements ... > > > > > > $df.residual > > > [1] 9 9 9 9 9 > > > 15457 more elements ... > > > > > > $abundance > > > [1]  -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > > > 15457 more elements ... > > > > > > $design > > >    regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM > > > 1        0        0        0        0        0        0        0 > > > 2        0        1        0        0        0        0        0 > > > 3        1        0        0        0        0        0        0 > > > 4        0        0        0        0        1        0        0 > > > 5        0        0        0        0        0        1        0 > > >    regroupMP regroupMW > > > 1        1        0 > > > 2        0        0 > > > 3        0        0 > > > 4        0        0 > > > 5        0        0 > > > 13 more rows ... > > > > > > $offset > > >          [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7] [,8] > > > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > >          [,9]    [,10]    [,11]    [,12]  [,13]    [,14] [,15]    [,16] > > > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > >          [,17]    [,18] > > > [1,] 14.81434 14.83441 > > > [2,] 14.81434 14.83441 > > > [3,] 14.81434 14.83441 > > > [4,] 14.81434 14.83441 > > > [5,] 14.81434 14.83441 > > > 15457 more rows ... > > > > > > $dispersion > > > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > > > 15457 more elements ... > > > > > > $method > > > [1] "oneway" > > > > > > $samples > > >    group lib.size norm.factors > > > MP    MP  2220863            1 > > > FA    FA  2179157            1 > > > FR    FR  1181036            1 > > > FW    FW  1305802            1 > > > MA    MA  1780507            1 > > > 13 more rows ... > > >    [[alternative HTML version deleted]] > > > > > > > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> <mailto:bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > > > > > > -- James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD REPLYlink written 6.2 years ago by capricy gao230
Hi capricy, as far as I now, limma output and edgeR do behave differentially in what you stated, that limma also calculates a moderated F-value similarly as in standard linear models. But i assume (not 100% shure about it) that when using a GLM, conducting a F test is not possible because your residuals are not normally distributed. But in "A comparison of methods for differential expression analysis of RNA-seq data"(BMC bioinformatics) the authors state (and I think I found a talk in the net where one of the edgeR authors was stating something similar) that using limma for rna seq data (described in the latest limma user manual) performs very well on RNA Seq Data. So maybe you are fine with limma anyways. Best regards, Moritz 2013/3/22 capricy gao <capricyg@yahoo.com> > I am actually trying to get more information about the results from edgeR. > I understand that glmLRT is doing the test, and so there is a pvalue for > it. My question is if glmFit is a fitting step like other GLM fitting in R, > do these fitted coefficients also have a pvalue? My expectation is that > they should and I used limma example since that limFit has the > corresponding information... > > Maybe I am wrong... > > > > > > > ________________________________ > From: James W. MacDonald <jmacdon@uw.edu> > > Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> > Sent: Thursday, March 21, 2013 4:17 PM > Subject: Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > I think you are fundamentally misunderstanding what you are doing. In the > context of a glm, the way you test for the significance of a coefficient is > by doing a likelihood ratio test, comparing a model that contains the > coefficient to a reduced model that does not. So there won't be any > p-values without making a comparison. > > In other words, the glmFit step sets up the model, and the glmLRT step > does the test for whatever coefficient you care about. By default it tests > all coefficients. From ?glmLRT: > > glmLRT conducts likelihood ratio tests for one or more > coefficients in the linear model. If coef is used, the null > hypothesis is that all the coefficients indicated by coef are > equal to zero. If contrast is non-null, then the null hypothesis > is that the specified contrast of the coefficients is equal to > zero. For example, a contrast of c(0,1,-1), assuming there are > three coefficients, would test the hypothesis that the second and > third coefficients are equal. > > > Best, > > Jim > > > > On 3/21/2013 4:55 PM, capricy gao wrote: > > I wonder if there are different versions of userguide/reference manual. > > > > I still couldn't find what I could do to get useful output, like the > pvalue for all the coefficients. > > > > The only pvalue I could see is in lrt output... > > > > > > ------------------------------------------------------------------ ------ > > *From:* James W. MacDonald <jmacdon@uw.edu> > > > *Cc:* "bioconductor@r-project.org" <bioconductor@r-project.org> > > *Sent:* Thursday, March 21, 2013 3:27 PM > > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > > > Hi Capricy, > > > > I'm not convinced you have read the user's guide carefully. There is, > for example, a section called 'Quick start' that is only seven pages in, > that shows the canonical glm analysis steps. Then there are two more worked > examples starting on page 52 that show pretty much all the steps required > to do a glm analysis. > > > > However, in the code you show below you haven't followed the examples, > nor have you used the accessor function that is intended to produce useful > output. > > > > Best, > > > > Jim > > > > > > > > On 3/21/2013 3:57 PM, capricy gao wrote: > > > I think I have carefully read both the userguide and reference manual, > but couldn't figure it out. Could anybody help me out what arguments should > be included? Thanks a lot. > > > > > > In limma fit results: there is something like this: > > > =============== > > > $p.value > > > Grp1 Grp2vs1 > > > Gene 1 0.8604469 6.019156e-05 > > > Gene 2 0.2174605 1.673262e-05 > > > Gene 3 0.3571957 5.758422e-02 > > > Gene 4 0.6789641 1.758690e-01 > > > Gene 5 0.4589329 8.223679e-01 > > > 95 more rows ... > > > ============== > > > > > > But in edgeR I couldn't find it as I posted previously.... > > > > > > > > > > -------------------------------------------------------------------- ---- > > > *From:* James W. MacDonald <jmacdon@uw.edu <mailto:jmacdon@uw.edu="">> > > > > *Cc:* "bioconductor@r-project.org <mailto:bioconductor@r-project.org>" > <bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> > > > *Sent:* Thursday, March 21, 2013 1:45 PM > > > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > > > > > Hi Capricy, > > > > > > Just like limma, edgeR has a very comprehensive user's guide that you > > > can consult. > > > > > > library(edgeR) > > > edgeRUsersGuide() > > > > > > Best, > > > > > > Jim > > > > > > > > > > > > On 3/21/2013 2:41 PM, capricy gao wrote: > > > > I am looking at my fit results from GLM of edgeR for my data: > > > > > > > > and assume that $coefficients are related to the expression levels. > Just wondering how I can get the p values for those coefficients since I > remember in limma, each coefficient will be accompanied by the > corresponding t value and p value. > > > > > > > > > > > > Thanks a lot for the help. > > > > > > > > capricy > > > > > > > > =================== > > > > > > > >> design=model.matrix(~0+regroup) > > > >> y<- estimateGLMCommonDisp(y,design) > > > >> y<- estimateGLMTagwiseDisp(y,design) > > > >> fit<- glmFit(y,design) > > > >> fit > > > > An object of class "DGEGLM" > > > > $coefficients > > > > regroupFR regroupFA regroupFM regroupFP regroupFW > regroupMA > > > > GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 > -9.260763 > > > > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 > -10.798889 > > > > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 > -11.654008 > > > > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 > -10.836647 > > > > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 > -9.722892 > > > > regroupMM regroupMP regroupMW > > > > GS_14929 -8.516256 -8.222225 -9.495394 > > > > GS_09776 -10.337434 -10.284703 -10.588544 > > > > GS_18434 -11.755905 -11.662276 -11.583950 > > > > GS_08334 -10.797529 -10.643409 -10.783104 > > > > GS_09550 -10.650495 -10.766462 -9.880359 > > > > 15457 more rows ... > > > > > > > > $fitted.values > > > > MP FA FR FW MA FW.1 > MM > > > > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 > 513.34590 > > > > GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 84.69746 > 83.04882 > > > > GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 12.51946 > 20.07922 > > > > GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 103.20054 > 52.41066 > > > > GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 106.94567 > 60.71721 > > > > FP MA.1 MW FM MM.1 > MW.1 MP.1 > > > > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 > 194.05252 568.23926 > > > > GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 > 65.01567 72.22056 > > > > GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 > 24.00644 18.19234 > > > > GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 > 53.51460 50.44331 > > > > GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 > 132.03679 44.59963 > > > > FM.1 FP.1 FA.1 FR.1 > > > > GS_14929 259.44483 324.86875 256.36216 198.47462 > > > > GS_09776 68.36129 74.42772 45.98959 92.72063 > > > > GS_18434 25.14609 21.03277 20.61808 33.30242 > > > > GS_08334 78.06556 97.00745 93.96021 57.08766 > > > > GS_09550 86.37041 127.95815 105.78744 71.50204 > > > > 15457 more rows ... > > > > > > > > $counts > > > > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 > MP.1 FM.1 FP.1 > > > > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 926 > 185 211 > > > > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 87 > 85 > > > > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 2 > 23 15 > > > > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 > 94 132 > > > > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 > 86 178 > > > > FA.1 FR.1 > > > > GS_14929 159 0 > > > > GS_09776 52 143 > > > > GS_18434 14 23 > > > > GS_08334 92 60 > > > > GS_09550 97 39 > > > > 15457 more rows ... > > > > > > > > $deviance > > > > GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 > > > > 19.059229 3.629342 9.509356 4.268291 8.809585 > > > > 15457 more elements ... > > > > > > > > $df.residual > > > > [1] 9 9 9 9 9 > > > > 15457 more elements ... > > > > > > > > $abundance > > > > [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > > > > 15457 more elements ... > > > > > > > > $design > > > > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA > regroupMM > > > > 1 0 0 0 0 0 0 0 > > > > 2 0 1 0 0 0 0 0 > > > > 3 1 0 0 0 0 0 0 > > > > 4 0 0 0 0 1 0 0 > > > > 5 0 0 0 0 0 1 0 > > > > regroupMP regroupMW > > > > 1 1 0 > > > > 2 0 0 > > > > 3 0 0 > > > > 4 0 0 > > > > 5 0 0 > > > > 13 more rows ... > > > > > > > > $offset > > > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] > > > > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > > [,9] [,10] [,11] [,12] [,13] [,14] [,15] > [,16] > > > > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > > [,17] [,18] > > > > [1,] 14.81434 14.83441 > > > > [2,] 14.81434 14.83441 > > > > [3,] 14.81434 14.83441 > > > > [4,] 14.81434 14.83441 > > > > [5,] 14.81434 14.83441 > > > > 15457 more rows ... > > > > > > > > $dispersion > > > > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > > > > 15457 more elements ... > > > > > > > > $method > > > > [1] "oneway" > > > > > > > > $samples > > > > group lib.size norm.factors > > > > MP MP 2220863 1 > > > > FA FA 2179157 1 > > > > FR FR 1181036 1 > > > > FW FW 1305802 1 > > > > MA MA 1780507 1 > > > > 13 more rows ... > > > > [[alternative HTML version deleted]] > > > > > > > > > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > <mailto:bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- James W. MacDonald, M.S. > > > Biostatistician > > > University of Washington > > > Environmental and Occupational Health Sciences > > > 4225 Roosevelt Way NE, # 100 > > > Seattle WA 98105-6099 > > > > > > > > > > > > > -- James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > > > > > > -- James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *Moritz Heß PhD Candidate * *Research associate Forest Research Institute of Baden Württemberg (FVA) Wonnhalde 4 79100 Freiburg (Germany) phone +49 761 4018 301* [[alternative HTML version deleted]]
ADD REPLYlink written 6.2 years ago by Moritz Hess60
Hi, Moritz, Thank you very much for clarification and the article you recommended. I originally raised this question actually because I was trying to do gene cluster analysis based on expression levels. The edgeR user guide said: in the following design, design=model.matrix(~0+regroup), coefficient will define the expression level of each group. I figured that if I really want to use the coefficient under such circumstances, I should filter all the coefficients with low confidence. So I was looking for the way to output the p values for those coefficients. Well, it looks like now I just take all as they are... ________________________________ From: Moritz Hess <ssehztirom@googlemail.com> Cc: James W. MacDonald <jmacdon@uw.edu>; "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Monday, March 25, 2013 8:35 AM Subject: Re: [BioC] EdgeR: coefficients for GLM for multiple groups Hi capricy, as far as I now, limma output and edgeR do behave differentially in what you stated, that limma also calculates a moderated F-value similarly as in standard linear models. But i assume (not 100% shure about it) that when using a GLM, conducting a F test is not possible because your residuals are not normally distributed. But in "A comparison of methods for differential expression analysis of RNA-seq data"(BMC bioinformatics) the authors state (and I think I found a talk in the net where one of the edgeR authors was stating something similar) that using limma for rna seq data (described in the latest limma user manual) performs very well on RNA Seq Data. So maybe you are fine with limma anyways. Best regards, Moritz I am actually trying to get more information about the results from edgeR. I understand that glmLRT is doing the test, and so there is a pvalue for it. My question is if glmFit is a fitting step like other GLM fitting in R, do these fitted coefficients also have a pvalue? My expectation is that they should and I used limma example since that limFit has the corresponding information... > >Maybe I am wrong... > > > > > > > >________________________________ > From: James W. MacDonald <jmacdon@uw.edu> > >Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> >Sent: Thursday, March 21, 2013 4:17 PM >Subject: Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > >I think you are fundamentally misunderstanding what you are doing. In the context of a glm, the way you test for the significance of a coefficient is by doing a likelihood ratio test, comparing a model that contains the coefficient to a reduced model that does not. So there won't be any p-values without making a comparison. > >In other words, the glmFit step sets up the model, and the glmLRT step does the test for whatever coefficient you care about. By default it tests all coefficients. From ?glmLRT: > >    glmLRT conducts likelihood ratio tests for one or more >     coefficients in the linear model. If coef is used, the null >     hypothesis is that all the coefficients indicated by coef are >     equal to zero. If contrast is non-null, then the null hypothesis >     is that the specified contrast of the coefficients is equal to >     zero. For example, a contrast of c(0,1,-1), assuming there are >     three coefficients, would test the hypothesis that the second and >     third coefficients are equal. > > >Best, > >Jim > > > >On 3/21/2013 4:55 PM, capricy gao wrote: >> I wonder if there are different versions of userguide/reference manual. >> >> I still couldn't find what I could do to get useful output, like the pvalue for all the coefficients. >> >> The only pvalue I could see is in lrt output... >> >> >> ------------------------------------------------------------------- ----- >> *From:* James W. MacDonald <jmacdon@uw.edu> > > >> *Cc:* "bioconductor@r-project.org" <bioconductor@r-project.org> >> *Sent:* Thursday, March 21, 2013 3:27 PM >> *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups >> >> Hi Capricy, >> >> I'm not convinced you have read the user's guide carefully. There is, for example, a section called 'Quick start' that is only seven pages in, that shows the canonical glm analysis steps. Then there are two more worked examples starting on page 52 that show pretty much all the steps required to do a glm analysis. >> >> However, in the code you show below you haven't followed the examples, nor have you used the accessor function that is intended to produce useful output. >> >> Best, >> >> Jim >> >> >> >> On 3/21/2013 3:57 PM, capricy gao wrote: >> > I think I have carefully read both the userguide and reference manual, but couldn't figure it out. Could anybody help me out what arguments should be included? Thanks a lot. >> > >> > In limma fit results: there is something like this: >> > =============== >> > $p.value >> >            Grp1      Grp2vs1 >> > Gene 1 0.8604469 6.019156e-05 >> > Gene 2 0.2174605 1.673262e-05 >> > Gene 3 0.3571957 5.758422e-02 >> > Gene 4 0.6789641 1.758690e-01 >> > Gene 5 0.4589329 8.223679e-01 >> > 95 more rows ... >> > ============== >> > >> > But in edgeR I couldn't find it as I posted previously.... >> > >> > >> > ----------------------------------------------------------------- ------- >> > *From:* James W. MacDonald <jmacdon@uw.edu <mailto:jmacdon@uw.edu="">> > > >> > *Cc:* "bioconductor@r-project.org <mailto:bioconductor@r-project.org>" <bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> >> > *Sent:* Thursday, March 21, 2013 1:45 PM >> > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups >> > >> > Hi Capricy, >> > >> > Just like limma, edgeR has a very comprehensive user's guide that you >> > can consult. >> > >> > library(edgeR) >> > edgeRUsersGuide() >> > >> > Best, >> > >> > Jim >> > >> > >> > >> > On 3/21/2013 2:41 PM, capricy gao wrote: >> > > I am looking at my fit results from GLM of edgeR for my data: >> > > >> > > and assume that $coefficients are related to the expression levels. Just wondering how I can get the p values for those coefficients since I remember in limma, each coefficient will be accompanied by the corresponding t value and p value. >> > > >> > > >> > > Thanks a lot for the help. >> > > >> > > capricy >> > > >> > > =================== >> > > >> > >> design=model.matrix(~0+regroup) >> > >> y<- estimateGLMCommonDisp(y,design) >> > >> y<- estimateGLMTagwiseDisp(y,design) >> > >> fit<- glmFit(y,design) >> > >> fit >> > > An object of class "DGEGLM" >> > > $coefficients >> > >            regroupFR  regroupFA  regroupFM  regroupFP regroupFW regroupMA >> > > GS_14929  -9.543565  -9.267612  -9.310823  -9.167299 -10.62094 -9.260763 >> > > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 -10.798889 >> > > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 -11.654008 >> > > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 -10.836647 >> > > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 -9.722892 >> > >            regroupMM  regroupMP  regroupMW >> > > GS_14929  -8.516256  -8.222225  -9.495394 >> > > GS_09776 -10.337434 -10.284703 -10.588544 >> > > GS_18434 -11.755905 -11.662276 -11.583950 >> > > GS_08334 -10.797529 -10.643409 -10.783104 >> > > GS_09550 -10.650495 -10.766462  -9.880359 >> > > 15457 more rows ... >> > > >> > > $fitted.values >> > >                  MP        FA      FR        FW        MA FW.1        MM >> > > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 513.34590 >> > > GS_09776  75.81638  36.91137 39.53083 32.643715  36.33823 84.69746  83.04882 >> > > GS_18434  19.09812  16.54813 14.19827  4.825192  15.43876 12.51946  20.07922 >> > > GS_08334  52.95485  75.41272 24.33895 39.775089  34.99088 103.20054  52.41066 >> > > GS_09550  46.82022  84.90529 30.48442 41.218519 106.62251 106.94567  60.71721 >> > >                  FP      MA.1        MW        FM      MM.1 MW.1      MP.1 >> > > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 194.05252 568.23926 >> > > GS_09776 17.921685  26.51705  64.98434  32.30797  28.080239 65.01567  72.22056 >> > > GS_18434  5.064548  11.26610  23.99487  11.88420  6.789129 24.00644  18.19234 >> > > GS_08334 23.358730  25.53385  53.48880  36.89427  17.720948 53.51460  50.44331 >> > > GS_09550 30.811447  77.80550 131.97315  40.81919  20.529537 132.03679  44.59963 >> > >                FM.1      FP.1      FA.1      FR.1 >> > > GS_14929 259.44483 324.86875 256.36216 198.47462 >> > > GS_09776  68.36129  74.42772  45.98959  92.72063 >> > > GS_18434  25.14609  21.03277  20.61808  33.30242 >> > > GS_08334  78.06556  97.00745  93.96021  57.08766 >> > > GS_09550  86.37041 127.95815 105.78744  71.50204 >> > > 15457 more rows ... >> > > >> > > $counts >> > >            MP  FA  FR FW  MA FW.1  MM  FP MA.1  MW  FM MM.1 MW.1 MP.1 FM.1 FP.1 >> > > GS_14929 221 284 170 23 267  105 209 106  52 218 158  277  170 926  185  211 >> > > GS_09776  75  32  17 28  41  96  86  15  23  65  23  27  65  73 87  85 >> > > GS_18434  36  22  19  8  21    6  22  7    7  27  13    6  21 2  23  15 >> > > GS_08334  44  77  23 50  41  78  78  14  21  61  29    8  46 59  94  132 >> > > GS_09550  82  92  45 54 105  75  95  18  79 153  41    8  111 11  86  178 >> > >          FA.1 FR.1 >> > > GS_14929  159    0 >> > > GS_09776  52  143 >> > > GS_18434  14  23 >> > > GS_08334  92  60 >> > > GS_09550  97  39 >> > > 15457 more rows ... >> > > >> > > $deviance >> > >  GS_14929  GS_09776  GS_18434  GS_08334  GS_09550 >> > > 19.059229  3.629342  9.509356  4.268291  8.809585 >> > > 15457 more elements ... >> > > >> > > $df.residual >> > > [1] 9 9 9 9 9 >> > > 15457 more elements ... >> > > >> > > $abundance >> > > [1]  -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 >> > > 15457 more elements ... >> > > >> > > $design >> > >    regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM >> > > 1        0        0        0        0        0        0 0 >> > > 2        0        1        0        0        0        0 0 >> > > 3        1        0        0        0        0        0 0 >> > > 4        0        0        0        0        1        0 0 >> > > 5        0        0        0        0        0        1 0 >> > >    regroupMP regroupMW >> > > 1        1        0 >> > > 2        0        0 >> > > 3        0        0 >> > > 4        0        0 >> > > 5        0        0 >> > > 13 more rows ... >> > > >> > > $offset >> > >          [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7] [,8] >> > > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 >> > > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 >> > > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 >> > > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 >> > > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 >> > >          [,9]    [,10]    [,11]    [,12]  [,13]    [,14] [,15]    [,16] >> > > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 >> > > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 >> > > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 >> > > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 >> > > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 >> > >          [,17]    [,18] >> > > [1,] 14.81434 14.83441 >> > > [2,] 14.81434 14.83441 >> > > [3,] 14.81434 14.83441 >> > > [4,] 14.81434 14.83441 >> > > [5,] 14.81434 14.83441 >> > > 15457 more rows ... >> > > >> > > $dispersion >> > > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 >> > > 15457 more elements ... >> > > >> > > $method >> > > [1] "oneway" >> > > >> > > $samples >> > >    group lib.size norm.factors >> > > MP    MP  2220863            1 >> > > FA    FA  2179157            1 >> > > FR    FR  1181036            1 >> > > FW    FW  1305802            1 >> > > MA    MA  1780507            1 >> > > 13 more rows ... >> > >    [[alternative HTML version deleted]] >> > > >> > > >> > > >> > > _______________________________________________ >> > > Bioconductor mailing list >> > > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> <mailto:bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> >> > > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > -- James W. MacDonald, M.S. >> > Biostatistician >> > University of Washington >> > Environmental and Occupational Health Sciences >> > 4225 Roosevelt Way NE, # 100 >> > Seattle WA 98105-6099 >> > >> > >> > >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> > >-- James W. MacDonald, M.S. >Biostatistician >University of Washington >Environmental and Occupational Health Sciences >4225 Roosevelt Way NE, # 100 >Seattle WA 98105-6099 >        [[alternative HTML version deleted]] > > >_______________________________________________ >Bioconductor mailing list >Bioconductor@r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Moritz Heß PhD Candidate Research associate Forest Research Institute of Baden Württemberg (FVA) Wonnhalde 4 79100 Freiburg (Germany) phone +49 761 4018 301 [[alternative HTML version deleted]]
ADD REPLYlink written 6.2 years ago by capricy gao230
Hi Simon, Thanks for the suggestions. I'll see about implementing them soon and let you know if I have any problems. -Ryan On Mar 21, 2013 6:44 AM, "Simon Anders" <anders@embl.de <mailto:anders@embl.de="">> wrote: Hi Ryan On 21/03/13 01:05, Ryan C. Thompson wrote: So, we have several great packages for testing for evidence of differential expression using RNA-seq counts, and I am happily using those. However, I would like to know if there is any method for testing for evidence of equivalent expression. That is, I would like to show with some level of confidence that treatment X has no (or negligible) effect on gene Y. Is there any way to conduct such a test? One of the new features of DESeq2 is that it provides a empirical-Bayes style shrinkage estimates of coefficients (i.e., log fold changes) and also estimates standard errors for these coefficients (taken from the the reciprocal curvature of the posterior). Your task is one of the application we had in mind for this. You want to know whether the fold change is zero or nearly zero, and you also want to be ascertained that this estimate of a nearly- zero fold change is a precise one. So, to find genes whose absolute log fold change is _reliably_ smaller than some threshold, take all those genes for which the estimated abs log fold change is below the threshold and the standard error is well below the threshold, too. The reason, why I write "below a threshold" rather than "equal to zero" is that it's biologically unreasonable to assume that a treatment has exactly zero effect on a gene's expression. Gene regulation is such an interconnected network that, at least in my view, every gene will react ever so slightly to every perturbance, but often, this reaction is too small to be measurable. This is why you should define a threshold for "biologically unlikely to be relevant". Let's say, we don't believe that an expression change of less than 15% (0.2 on a log2 scale) is worth bothering. So, if you might take all genes with abs log2 fold change below 0.2, and furthermore require that the standard error of this log2 fold change estimate is below, say, 0.05, than you will get gene with true values well below at least 0.3 or so. In the end, the thresholds won't matter too much but if you want real p values, this is possible, too: Simply divide the log2 fold change estimates by their standard errors to get z scores (this works because the sampling distribution of our fold change estimates is reasonably close to normal), and do two one-sided tests (TOST) as suggested by Ryan above. Actually, as this may come up more often, we should maybe explain the details of using this approach in the vignette. Ask again if you want to see some code. Simon _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org <mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLYlink written 6.2 years ago by Ryan C. Thompson7.3k
Ok, after poking around a bit, I'm not sure how to do this. Could you provide an example of computing p-values for the case where H0 = abs(coefficient) > epsilon H1 = abs(coefficient) <= epsilon I got as far as computing the z-scores, but I'm not sure how to go from there to p-values for a test of equivalence. Thanks, -Ryan On 03/21/2013 06:42 AM, Simon Anders wrote: > Hi Ryan > > On 21/03/13 01:05, Ryan C. Thompson wrote: >> So, we have several great packages for testing for evidence of >> differential expression using RNA-seq counts, and I am happily using >> those. However, I would like to know if there is any method for testing >> for evidence of equivalent expression. That is, I would like to show >> with some level of confidence that treatment X has no (or negligible) >> effect on gene Y. Is there any way to conduct such a test? > > One of the new features of DESeq2 is that it provides a > empirical-Bayes style shrinkage estimates of coefficients (i.e., log > fold changes) and also estimates standard errors for these > coefficients (taken from the the reciprocal curvature of the > posterior). Your task is one of the application we had in mind for this. > > You want to know whether the fold change is zero or nearly zero, and > you also want to be ascertained that this estimate of a nearly-zero > fold change is a precise one. So, to find genes whose absolute log > fold change is _reliably_ smaller than some threshold, take all those > genes for which the estimated abs log fold change is below the > threshold and the standard error is well below the threshold, too. > > The reason, why I write "below a threshold" rather than "equal to > zero" is that it's biologically unreasonable to assume that a > treatment has exactly zero effect on a gene's expression. Gene > regulation is such an interconnected network that, at least in my > view, every gene will react ever so slightly to every perturbance, but > often, this reaction is too small to be measurable. This is why you > should define a threshold for "biologically unlikely to be relevant". > Let's say, we don't believe that an expression change of less than 15% > (0.2 on a log2 scale) is worth bothering. So, if you might take all > genes with abs log2 fold change below 0.2, and furthermore require > that the standard error of this log2 fold change estimate is below, > say, 0.05, than you will get gene with true values well below at least > 0.3 or so. > > In the end, the thresholds won't matter too much but if you want real > p values, this is possible, too: Simply divide the log2 fold change > estimates by their standard errors to get z scores (this works because > the sampling distribution of our fold change estimates is reasonably > close to normal), and do two one-sided tests (TOST) as suggested by > Ryan above. > > Actually, as this may come up more often, we should maybe explain the > details of using this approach in the vignette. Ask again if you want > to see some code. > > Simon > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 6.2 years ago by Ryan C. Thompson7.3k
Hi Ryan On 22/03/13 00:11, Ryan C. Thompson wrote: > I got as far as computing the z-scores, but I'm not sure how to go from > there to p-values for a test of equivalence. I guess I made it sound more trivial than it is. The "two one-sided tests" (TOST) scheme already mentioned in this thread is originally due to Schuirmann (Journal of Pharmacokinetics and Pharmacodynamics, 1987) -- or rather, the acronym is due to him, the idea is probably older, and he just pointed out that this what one should do in situations such as yours. To recap, you want to test the null hypothesis that your log fold change beta is larger than some threshold theta, i.e., you want to find genes for which you can reject this null hypothesis and hence say with some certainty that |beta| < theta. The TOST scheme suggests, as its name implies to perform two pone sided tests, one with the null hypothesis H0_A: beta > theta, the other with H0_B: beta < -theta, and then use the larger of the two p values. I demonstrate this below with some code. # Let's use the example data from the pasilla package library( DESeq2 ) library( pasilla ) data( "pasillaGenes" ) # Create a DESeq2 data object from the pasilla data dse <- DESeqSummarizedExperimentFromMatrix( counts(pasillaGenes), pData(pasillaGenes)[,c("condition","type")], ~ type + condition ) # Perform a standard DESeq2 analysis dse <- DESeq(dse) # The log2 fold changes are found here beta <- results(dse)$log2FoldChange # Just to make the following clearer, I should point out that the # "results" accessor is just a short-cut for this access to the rowData: all( beta == mcols(rowData(dse))$conditionuntreated, na.rm=TRUE ) # (returns TRUE) # The log fold change estimates all come with standard error information # which we find in the rowData (maybe we should copy this to the # 'results', too) betaSE <- mcols(rowData(dse))$SE_conditionuntreated # Internally, the Wald test is implemented as a simple two-sided # z test of beta/betaSE. Two demonstrate this, we to the test # manually and compare pvalDE <- 2 * pnorm( abs( beta ), sd = betaSE, lower.tail=FALSE ) all( abs( pvalDE - results(dse)$pvalue ) < 1e-15, na.rm=TRUE ) # (returns TRUE) # This was the test for DE, of course, i.e., small pvalDE means that # the gene's expression change (the true value of beta) is not zero # What we want is the opposite, namely find gene, for which abs(beta) # is smaller than some threshold, theta theta <- .3 # So, we do our two one-sided tests. For a one-sided z test, we # simply use tail probabilities from the normal distribution. # First, the test of H0_A: true_beta > thr pA <- pnorm( beta, thr, betaSE, lower.tail=TRUE ) # Next, the test of H0_B: true_beta < -thr pB <- pnorm( beta, -thr, betaSE, lower.tail=FALSE ) # The overall p value is the maximum, because we want to reject H0_A # and H0_B simultaneously pvalTOST <- pmax( pA, pB ) # Let's adjust our two p values with BH: sigDE <- p.adjust( pvalDE, "BH" ) < .1 sigSmall <- p.adjust( pvalTOST, "BH" ) < .1 # And make an MA plot, with sigDE in red and sigSmall in green plot( rowMeans( counts(dse,normalized=TRUE) ), beta, log="x", pch=20, cex=.2, col = 1 + sigDE + 2*sigSmall ) # Plot is attached. I think, we will include this code in one of the coming releases of DESeq2. Simon -------------- next part -------------- A non-text attachment was scrubbed... Name: Rplot001.png Type: image/png Size: 46596 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130322="" ed095934="" attachment.png="">
ADD REPLYlink written 6.2 years ago by Simon Anders3.5k
Thanks, Simon, that makes it very clear. On 03/22/2013 05:08 AM, Simon Anders wrote: > Hi Ryan > > On 22/03/13 00:11, Ryan C. Thompson wrote: >> I got as far as computing the z-scores, but I'm not sure how to go from >> there to p-values for a test of equivalence. > > I guess I made it sound more trivial than it is. The "two one-sided > tests" (TOST) scheme already mentioned in this thread is originally > due to Schuirmann (Journal of Pharmacokinetics and Pharmacodynamics, > 1987) -- or rather, the acronym is due to him, the idea is probably > older, and he just pointed out that this what one should do in > situations such as yours. > > To recap, you want to test the null hypothesis that your log fold > change beta is larger than some threshold theta, i.e., you want to > find genes for which you can reject this null hypothesis and hence say > with some certainty that |beta| < theta. The TOST scheme suggests, as > its name implies to perform two pone sided tests, one with the null > hypothesis H0_A: beta > theta, the other with H0_B: beta < -theta, and > then use the larger of the two p values. > > I demonstrate this below with some code. > > # Let's use the example data from the pasilla package > library( DESeq2 ) > library( pasilla ) > data( "pasillaGenes" ) > > # Create a DESeq2 data object from the pasilla data > dse <- DESeqSummarizedExperimentFromMatrix( > counts(pasillaGenes), pData(pasillaGenes)[,c("condition","type")], > ~ type + condition ) > > # Perform a standard DESeq2 analysis > dse <- DESeq(dse) > > # The log2 fold changes are found here > beta <- results(dse)$log2FoldChange > > # Just to make the following clearer, I should point out that the > # "results" accessor is just a short-cut for this access to the rowData: > all( beta == mcols(rowData(dse))$conditionuntreated, na.rm=TRUE ) > # (returns TRUE) > > # The log fold change estimates all come with standard error information > # which we find in the rowData (maybe we should copy this to the > # 'results', too) > betaSE <- mcols(rowData(dse))$SE_conditionuntreated > > # Internally, the Wald test is implemented as a simple two-sided > # z test of beta/betaSE. Two demonstrate this, we to the test > # manually and compare > pvalDE <- 2 * pnorm( abs( beta ), sd = betaSE, lower.tail=FALSE ) > all( abs( pvalDE - results(dse)$pvalue ) < 1e-15, na.rm=TRUE ) > # (returns TRUE) > > # This was the test for DE, of course, i.e., small pvalDE means that > # the gene's expression change (the true value of beta) is not zero > > # What we want is the opposite, namely find gene, for which abs(beta) > # is smaller than some threshold, theta > theta <- .3 > > # So, we do our two one-sided tests. For a one-sided z test, we > # simply use tail probabilities from the normal distribution. > > # First, the test of H0_A: true_beta > thr > pA <- pnorm( beta, thr, betaSE, lower.tail=TRUE ) > > # Next, the test of H0_B: true_beta < -thr > pB <- pnorm( beta, -thr, betaSE, lower.tail=FALSE ) > > # The overall p value is the maximum, because we want to reject H0_A > # and H0_B simultaneously > pvalTOST <- pmax( pA, pB ) > > > # Let's adjust our two p values with BH: > sigDE <- p.adjust( pvalDE, "BH" ) < .1 > sigSmall <- p.adjust( pvalTOST, "BH" ) < .1 > > # And make an MA plot, with sigDE in red and sigSmall in green > plot( > rowMeans( counts(dse,normalized=TRUE) ), beta, > log="x", pch=20, cex=.2, > col = 1 + sigDE + 2*sigSmall ) > # Plot is attached. > > I think, we will include this code in one of the coming releases of > DESeq2. > > > Simon >
ADD REPLYlink written 6.2 years ago by Ryan C. Thompson7.3k
Out of curiosity, could a TOST also be implemented using likelihood ratio tests (or other tests) by using a pair of reduced models where the desired coefficient is set to +/-theta instead of zero? On 03/22/2013 05:08 AM, Simon Anders wrote: > Hi Ryan > > On 22/03/13 00:11, Ryan C. Thompson wrote: >> I got as far as computing the z-scores, but I'm not sure how to go from >> there to p-values for a test of equivalence. > > I guess I made it sound more trivial than it is. The "two one-sided > tests" (TOST) scheme already mentioned in this thread is originally > due to Schuirmann (Journal of Pharmacokinetics and Pharmacodynamics, > 1987) -- or rather, the acronym is due to him, the idea is probably > older, and he just pointed out that this what one should do in > situations such as yours. > > To recap, you want to test the null hypothesis that your log fold > change beta is larger than some threshold theta, i.e., you want to > find genes for which you can reject this null hypothesis and hence say > with some certainty that |beta| < theta. The TOST scheme suggests, as > its name implies to perform two pone sided tests, one with the null > hypothesis H0_A: beta > theta, the other with H0_B: beta < -theta, and > then use the larger of the two p values. > > I demonstrate this below with some code. > > # Let's use the example data from the pasilla package > library( DESeq2 ) > library( pasilla ) > data( "pasillaGenes" ) > > # Create a DESeq2 data object from the pasilla data > dse <- DESeqSummarizedExperimentFromMatrix( > counts(pasillaGenes), pData(pasillaGenes)[,c("condition","type")], > ~ type + condition ) > > # Perform a standard DESeq2 analysis > dse <- DESeq(dse) > > # The log2 fold changes are found here > beta <- results(dse)$log2FoldChange > > # Just to make the following clearer, I should point out that the > # "results" accessor is just a short-cut for this access to the rowData: > all( beta == mcols(rowData(dse))$conditionuntreated, na.rm=TRUE ) > # (returns TRUE) > > # The log fold change estimates all come with standard error information > # which we find in the rowData (maybe we should copy this to the > # 'results', too) > betaSE <- mcols(rowData(dse))$SE_conditionuntreated > > # Internally, the Wald test is implemented as a simple two-sided > # z test of beta/betaSE. Two demonstrate this, we to the test > # manually and compare > pvalDE <- 2 * pnorm( abs( beta ), sd = betaSE, lower.tail=FALSE ) > all( abs( pvalDE - results(dse)$pvalue ) < 1e-15, na.rm=TRUE ) > # (returns TRUE) > > # This was the test for DE, of course, i.e., small pvalDE means that > # the gene's expression change (the true value of beta) is not zero > > # What we want is the opposite, namely find gene, for which abs(beta) > # is smaller than some threshold, theta > theta <- .3 > > # So, we do our two one-sided tests. For a one-sided z test, we > # simply use tail probabilities from the normal distribution. > > # First, the test of H0_A: true_beta > thr > pA <- pnorm( beta, thr, betaSE, lower.tail=TRUE ) > > # Next, the test of H0_B: true_beta < -thr > pB <- pnorm( beta, -thr, betaSE, lower.tail=FALSE ) > > # The overall p value is the maximum, because we want to reject H0_A > # and H0_B simultaneously > pvalTOST <- pmax( pA, pB ) > > > # Let's adjust our two p values with BH: > sigDE <- p.adjust( pvalDE, "BH" ) < .1 > sigSmall <- p.adjust( pvalTOST, "BH" ) < .1 > > # And make an MA plot, with sigDE in red and sigSmall in green > plot( > rowMeans( counts(dse,normalized=TRUE) ), beta, > log="x", pch=20, cex=.2, > col = 1 + sigDE + 2*sigSmall ) > # Plot is attached. > > I think, we will include this code in one of the coming releases of > DESeq2. > > > Simon >
ADD REPLYlink written 6.2 years ago by Ryan C. Thompson7.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 317 users visited in the last hour