Testing for no change in RNA-seq data?

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

Hi all, So, we have several great packages for testing for evidence of differential expression using RNA-seq counts, and I am happily using those. However, I would like to know if there is any method for testing for evidence of equivalent expression. That is, I would like to show with some level of confidence that treatment X has no (or negligible) effect on gene Y. Is there any way to conduct such a test? I know I can test for differential expression and see if the gene list comes up empty, but lack of evidence for differential expression is not the same as evidence for equivalent expression. So, are there any packages for doing equivalence testing on gene expression data? If it exists for limma, then I'm happy to run my data through voom to analyze it if necessary. Thanks, -Ryan Thompson

limma limma • 2.3k views

ADD COMMENT • link updated 11.1 years ago by Simon Anders ★ 3.7k • written 11.1 years ago by Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Albyn Jones ▴ 70

@albyn-jones-3850

Last seen 9.6 years ago

see "tests of equivalence", a common one being the TOST, aka two one-sided tests. you need to have a substantive definition of equivalence! albyn On 2013-03-20 17:05, Ryan C. Thompson wrote: > Hi all, > > So, we have several great packages for testing for evidence of > differential expression using RNA-seq counts, and I am happily using > those. However, I would like to know if there is any method for > testing for evidence of equivalent expression. That is, I would like > to show with some level of confidence that treatment X has no (or > negligible) effect on gene Y. Is there any way to conduct such a > test? > > I know I can test for differential expression and see if the gene > list comes up empty, but lack of evidence for differential expression > is not the same as evidence for equivalent expression. So, are there > any packages for doing equivalence testing on gene expression data? > If > it exists for limma, then I'm happy to run my data through voom to > analyze it if necessary. > > Thanks, > > -Ryan Thompson > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.1 years ago Albyn Jones ▴ 70

0

Entering edit mode

Are there equivalence tests that do not require the assumption of normality? Or should I transform with limma::voom and pretend it's normal? -Ryan On 03/20/2013 05:15 PM, Albyn Jones wrote: > see "tests of equivalence", a common one being the TOST, > aka two one-sided tests. you need to have a substantive > definition of equivalence! > > albyn > > On 2013-03-20 17:05, Ryan C. Thompson wrote: >> Hi all, >> >> So, we have several great packages for testing for evidence of >> differential expression using RNA-seq counts, and I am happily using >> those. However, I would like to know if there is any method for >> testing for evidence of equivalent expression. That is, I would like >> to show with some level of confidence that treatment X has no (or >> negligible) effect on gene Y. Is there any way to conduct such a test? >> >> I know I can test for differential expression and see if the gene >> list comes up empty, but lack of evidence for differential expression >> is not the same as evidence for equivalent expression. So, are there >> any packages for doing equivalence testing on gene expression data? If >> it exists for limma, then I'm happy to run my data through voom to >> analyze it if necessary. >> >> Thanks, >> >> -Ryan Thompson >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.1 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Simon Anders ★ 3.7k

@simon-anders-3855

Last seen 3.7 years ago

Zentrum für Molekularbiologie, Universi…

Hi Ryan On 21/03/13 01:05, Ryan C. Thompson wrote: > So, we have several great packages for testing for evidence of > differential expression using RNA-seq counts, and I am happily using > those. However, I would like to know if there is any method for testing > for evidence of equivalent expression. That is, I would like to show > with some level of confidence that treatment X has no (or negligible) > effect on gene Y. Is there any way to conduct such a test? One of the new features of DESeq2 is that it provides a empirical- Bayes style shrinkage estimates of coefficients (i.e., log fold changes) and also estimates standard errors for these coefficients (taken from the the reciprocal curvature of the posterior). Your task is one of the application we had in mind for this. You want to know whether the fold change is zero or nearly zero, and you also want to be ascertained that this estimate of a nearly-zero fold change is a precise one. So, to find genes whose absolute log fold change is _reliably_ smaller than some threshold, take all those genes for which the estimated abs log fold change is below the threshold and the standard error is well below the threshold, too. The reason, why I write "below a threshold" rather than "equal to zero" is that it's biologically unreasonable to assume that a treatment has exactly zero effect on a gene's expression. Gene regulation is such an interconnected network that, at least in my view, every gene will react ever so slightly to every perturbance, but often, this reaction is too small to be measurable. This is why you should define a threshold for "biologically unlikely to be relevant". Let's say, we don't believe that an expression change of less than 15% (0.2 on a log2 scale) is worth bothering. So, if you might take all genes with abs log2 fold change below 0.2, and furthermore require that the standard error of this log2 fold change estimate is below, say, 0.05, than you will get gene with true values well below at least 0.3 or so. In the end, the thresholds won't matter too much but if you want real p values, this is possible, too: Simply divide the log2 fold change estimates by their standard errors to get z scores (this works because the sampling distribution of our fold change estimates is reasonably close to normal), and do two one-sided tests (TOST) as suggested by Ryan above. Actually, as this may come up more often, we should maybe explain the details of using this approach in the vignette. Ask again if you want to see some code. Simon

ADD COMMENT • link 11.1 years ago Simon Anders ★ 3.7k

0

Entering edit mode

I am looking at my fit results from GLM of edgeR for my data: and assume that $coefficients are related to the expression levels. Just wondering how I can get the p values for those coefficients since I remember in limma, each coefficient will be accompanied by the corresponding t value and p value. Thanks a lot for the help. capricy =================== > design=model.matrix(~0+regroup) > y <- estimateGLMCommonDisp(y,design) > y <- estimateGLMTagwiseDisp(y,design) > fit <- glmFit(y,design) > fit An object of class "DGEGLM" $coefficients regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 -9.260763 GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 -10.798889 GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 -11.654008 GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 -10.836647 GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 -9.722892 regroupMM regroupMP regroupMW GS_14929 -8.516256 -8.222225 -9.495394 GS_09776 -10.337434 -10.284703 -10.588544 GS_18434 -11.755905 -11.662276 -11.583950 GS_08334 -10.797529 -10.643409 -10.783104 GS_09550 -10.650495 -10.766462 -9.880359 15457 more rows ... $fitted.values MP FA FR FW MA FW.1 MM GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 513.34590 GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 84.69746 83.04882 GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 12.51946 20.07922 GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 103.20054 52.41066 GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 106.94567 60.71721 FP MA.1 MW FM MM.1 MW.1 MP.1 GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 194.05252 568.23926 GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 65.01567 72.22056 GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 24.00644 18.19234 GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 53.51460 50.44331 GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 132.03679 44.59963 FM.1 FP.1 FA.1 FR.1 GS_14929 259.44483 324.86875 256.36216 198.47462 GS_09776 68.36129 74.42772 45.98959 92.72063 GS_18434 25.14609 21.03277 20.61808 33.30242 GS_08334 78.06556 97.00745 93.96021 57.08766 GS_09550 86.37041 127.95815 105.78744 71.50204 15457 more rows ... $counts MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 MP.1 FM.1 FP.1 GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 926 185 211 GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 87 85 GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 2 23 15 GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 94 132 GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 86 178 FA.1 FR.1 GS_14929 159 0 GS_09776 52 143 GS_18434 14 23 GS_08334 92 60 GS_09550 97 39 15457 more rows ... $deviance GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 19.059229 3.629342 9.509356 4.268291 8.809585 15457 more elements ... $df.residual [1] 9 9 9 9 9 15457 more elements ... $abundance [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 15457 more elements ... $design regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM 1 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 3 1 0 0 0 0 0 0 4 0 0 0 0 1 0 0 5 0 0 0 0 0 1 0 regroupMP regroupMW 1 1 0 2 0 0 3 0 0 4 0 0 5 0 0 13 more rows ... $offset [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 [,17] [,18] [1,] 14.81434 14.83441 [2,] 14.81434 14.83441 [3,] 14.81434 14.83441 [4,] 14.81434 14.83441 [5,] 14.81434 14.83441 15457 more rows ... $dispersion [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 15457 more elements ... $method [1] "oneway" $samples group lib.size norm.factors MP MP 2220863 1 FA FA 2179157 1 FR FR 1181036 1 FW FW 1305802 1 MA MA 1780507 1 13 more rows ... [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago capricy gao ▴ 230

0

Entering edit mode

Hi Capricy, Just like limma, edgeR has a very comprehensive user's guide that you can consult. library(edgeR) edgeRUsersGuide() Best, Jim On 3/21/2013 2:41 PM, capricy gao wrote: > I am looking at my fit results from GLM of edgeR for my data: > > and assume that $coefficients are related to the expression levels. Just wondering how I can get the p values for those coefficients since I remember in limma, each coefficient will be accompanied by the corresponding t value and p value. > > > Thanks a lot for the help. > > capricy > > =================== > >> design=model.matrix(~0+regroup) >> y<- estimateGLMCommonDisp(y,design) >> y<- estimateGLMTagwiseDisp(y,design) >> fit<- glmFit(y,design) >> fit > An object of class "DGEGLM" > $coefficients > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA > GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 -9.260763 > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 -10.798889 > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 -11.654008 > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 -10.836647 > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 -9.722892 > regroupMM regroupMP regroupMW > GS_14929 -8.516256 -8.222225 -9.495394 > GS_09776 -10.337434 -10.284703 -10.588544 > GS_18434 -11.755905 -11.662276 -11.583950 > GS_08334 -10.797529 -10.643409 -10.783104 > GS_09550 -10.650495 -10.766462 -9.880359 > 15457 more rows ... > > $fitted.values > MP FA FR FW MA FW.1 MM > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 513.34590 > GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 84.69746 83.04882 > GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 12.51946 20.07922 > GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 103.20054 52.41066 > GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 106.94567 60.71721 > FP MA.1 MW FM MM.1 MW.1 MP.1 > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 194.05252 568.23926 > GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 65.01567 72.22056 > GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 24.00644 18.19234 > GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 53.51460 50.44331 > GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 132.03679 44.59963 > FM.1 FP.1 FA.1 FR.1 > GS_14929 259.44483 324.86875 256.36216 198.47462 > GS_09776 68.36129 74.42772 45.98959 92.72063 > GS_18434 25.14609 21.03277 20.61808 33.30242 > GS_08334 78.06556 97.00745 93.96021 57.08766 > GS_09550 86.37041 127.95815 105.78744 71.50204 > 15457 more rows ... > > $counts > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 MP.1 FM.1 FP.1 > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 926 185 211 > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 87 85 > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 2 23 15 > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 94 132 > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 86 178 > FA.1 FR.1 > GS_14929 159 0 > GS_09776 52 143 > GS_18434 14 23 > GS_08334 92 60 > GS_09550 97 39 > 15457 more rows ... > > $deviance > GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 > 19.059229 3.629342 9.509356 4.268291 8.809585 > 15457 more elements ... > > $df.residual > [1] 9 9 9 9 9 > 15457 more elements ... > > $abundance > [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > 15457 more elements ... > > $design > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM > 1 0 0 0 0 0 0 0 > 2 0 1 0 0 0 0 0 > 3 1 0 0 0 0 0 0 > 4 0 0 0 0 1 0 0 > 5 0 0 0 0 0 1 0 > regroupMP regroupMW > 1 1 0 > 2 0 0 > 3 0 0 > 4 0 0 > 5 0 0 > 13 more rows ... > > $offset > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [,17] [,18] > [1,] 14.81434 14.83441 > [2,] 14.81434 14.83441 > [3,] 14.81434 14.83441 > [4,] 14.81434 14.83441 > [5,] 14.81434 14.83441 > 15457 more rows ... > > $dispersion > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > 15457 more elements ... > > $method > [1] "oneway" > > $samples > group lib.size norm.factors > MP MP 2220863 1 > FA FA 2179157 1 > FR FR 1181036 1 > FW FW 1305802 1 > MA MA 1780507 1 > 13 more rows ... > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.1 years ago James W. MacDonald 65k

0

Entering edit mode

I think I have carefully read both the userguide and reference manual, but couldn't figure it out. Could anybody help me out what arguments should be included? Thanks a lot. In limma fit results: there is something like this: =============== $p.value Grp1 Grp2vs1 Gene 1 0.8604469 6.019156e-05 Gene 2 0.2174605 1.673262e-05 Gene 3 0.3571957 5.758422e-02 Gene 4 0.6789641 1.758690e-01 Gene 5 0.4589329 8.223679e-01 95 more rows ... ============== But in edgeR I couldn't find it as I posted previously.... ________________________________ From: James W. MacDonald <jmacdon@uw.edu> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Thursday, March 21, 2013 1:45 PM Subject: Re: [BioC] EdgeR: coefficients for GLM for multiple groups Hi Capricy, Just like limma, edgeR has a very comprehensive user's guide that you can consult. library(edgeR) edgeRUsersGuide() Best, Jim On 3/21/2013 2:41 PM, capricy gao wrote: > I am looking at my fit results from GLM of edgeR for my data: > > and assume that $coefficients are related to the expression levels. Just wondering how I can get the p values for those coefficients since I remember in limma, each coefficient will be accompanied by the corresponding t value and p value. > > > Thanks a lot for the help. > > capricy > > =================== > >> design=model.matrix(~0+regroup) >> y<- estimateGLMCommonDisp(y,design) >> y<- estimateGLMTagwiseDisp(y,design) >> fit<- glmFit(y,design) >> fit > An object of class "DGEGLM" > $coefficients > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA > GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 -9.260763 > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 -10.798889 > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 -11.654008 > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 -10.836647 > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 -9.722892 > regroupMM regroupMP regroupMW > GS_14929 -8.516256 -8.222225 -9.495394 > GS_09776 -10.337434 -10.284703 -10.588544 > GS_18434 -11.755905 -11.662276 -11.583950 > GS_08334 -10.797529 -10.643409 -10.783104 > GS_09550 -10.650495 -10.766462 -9.880359 > 15457 more rows ... > > $fitted.values > MP FA FR FW MA FW.1 MM > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 513.34590 > GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 84.69746 83.04882 > GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 12.51946 20.07922 > GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 103.20054 52.41066 > GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 106.94567 60.71721 > FP MA.1 MW FM MM.1 MW.1 MP.1 > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 194.05252 568.23926 > GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 65.01567 72.22056 > GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 24.00644 18.19234 > GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 53.51460 50.44331 > GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 132.03679 44.59963 > FM.1 FP.1 FA.1 FR.1 > GS_14929 259.44483 324.86875 256.36216 198.47462 > GS_09776 68.36129 74.42772 45.98959 92.72063 > GS_18434 25.14609 21.03277 20.61808 33.30242 > GS_08334 78.06556 97.00745 93.96021 57.08766 > GS_09550 86.37041 127.95815 105.78744 71.50204 > 15457 more rows ... > > $counts > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 MP.1 FM.1 FP.1 > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 926 185 211 > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 87 85 > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 2 23 15 > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 94 132 > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 86 178 > FA.1 FR.1 > GS_14929 159 0 > GS_09776 52 143 > GS_18434 14 23 > GS_08334 92 60 > GS_09550 97 39 > 15457 more rows ... > > $deviance > GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 > 19.059229 3.629342 9.509356 4.268291 8.809585 > 15457 more elements ... > > $df.residual > [1] 9 9 9 9 9 > 15457 more elements ... > > $abundance > [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > 15457 more elements ... > > $design > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM > 1 0 0 0 0 0 0 0 > 2 0 1 0 0 0 0 0 > 3 1 0 0 0 0 0 0 > 4 0 0 0 0 1 0 0 > 5 0 0 0 0 0 1 0 > regroupMP regroupMW > 1 1 0 > 2 0 0 > 3 0 0 > 4 0 0 > 5 0 0 > 13 more rows ... > > $offset > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > [,17] [,18] > [1,] 14.81434 14.83441 > [2,] 14.81434 14.83441 > [3,] 14.81434 14.83441 > [4,] 14.81434 14.83441 > [5,] 14.81434 14.83441 > 15457 more rows ... > > $dispersion > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > 15457 more elements ... > > $method > [1] "oneway" > > $samples > group lib.size norm.factors > MP MP 2220863 1 > FA FA 2179157 1 > FR FR 1181036 1 > FW FW 1305802 1 > MA MA 1780507 1 > 13 more rows ... > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago capricy gao ▴ 230

0

Entering edit mode

Hi Capricy, I'm not convinced you have read the user's guide carefully. There is, for example, a section called 'Quick start' that is only seven pages in, that shows the canonical glm analysis steps. Then there are two more worked examples starting on page 52 that show pretty much all the steps required to do a glm analysis. However, in the code you show below you haven't followed the examples, nor have you used the accessor function that is intended to produce useful output. Best, Jim On 3/21/2013 3:57 PM, capricy gao wrote: > I think I have carefully read both the userguide and reference manual, > but couldn't figure it out. Could anybody help me out what arguments > should be included? Thanks a lot. > > In limma fit results: there is something like this: > =============== > $p.value > Grp1 Grp2vs1 > Gene 1 0.8604469 6.019156e-05 > Gene 2 0.2174605 1.673262e-05 > Gene 3 0.3571957 5.758422e-02 > Gene 4 0.6789641 1.758690e-01 > Gene 5 0.4589329 8.223679e-01 > 95 more rows ... > ============== > > But in edgeR I couldn't find it as I posted previously.... > > > -------------------------------------------------------------------- ---- > *From:* James W. MacDonald <jmacdon at="" uw.edu=""> > *To:* capricy gao <capricyg at="" yahoo.com=""> > *Cc:* "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > *Sent:* Thursday, March 21, 2013 1:45 PM > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > Hi Capricy, > > Just like limma, edgeR has a very comprehensive user's guide that you > can consult. > > library(edgeR) > edgeRUsersGuide() > > Best, > > Jim > > > > On 3/21/2013 2:41 PM, capricy gao wrote: > > I am looking at my fit results from GLM of edgeR for my data: > > > > and assume that $coefficients are related to the expression levels. > Just wondering how I can get the p values for those coefficients since > I remember in limma, each coefficient will be accompanied by the > corresponding t value and p value. > > > > > > Thanks a lot for the help. > > > > capricy > > > > =================== > > > >> design=model.matrix(~0+regroup) > >> y<- estimateGLMCommonDisp(y,design) > >> y<- estimateGLMTagwiseDisp(y,design) > >> fit<- glmFit(y,design) > >> fit > > An object of class "DGEGLM" > > $coefficients > > regroupFR regroupFA regroupFM regroupFP regroupFW > regroupMA > > GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 > -9.260763 > > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 > -10.798889 > > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 > -11.654008 > > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 > -10.836647 > > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 > -9.722892 > > regroupMM regroupMP regroupMW > > GS_14929 -8.516256 -8.222225 -9.495394 > > GS_09776 -10.337434 -10.284703 -10.588544 > > GS_18434 -11.755905 -11.662276 -11.583950 > > GS_08334 -10.797529 -10.643409 -10.783104 > > GS_09550 -10.650495 -10.766462 -9.880359 > > 15457 more rows ... > > > > $fitted.values > > MP FA FR FW MA FW.1 > MM > > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 > 513.34590 > > GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 84.69746 > 83.04882 > > GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 12.51946 > 20.07922 > > GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 103.20054 > 52.41066 > > GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 106.94567 > 60.71721 > > FP MA.1 MW FM MM.1 > MW.1 MP.1 > > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 > 194.05252 568.23926 > > GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 > 65.01567 72.22056 > > GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 > 24.00644 18.19234 > > GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 > 53.51460 50.44331 > > GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 > 132.03679 44.59963 > > FM.1 FP.1 FA.1 FR.1 > > GS_14929 259.44483 324.86875 256.36216 198.47462 > > GS_09776 68.36129 74.42772 45.98959 92.72063 > > GS_18434 25.14609 21.03277 20.61808 33.30242 > > GS_08334 78.06556 97.00745 93.96021 57.08766 > > GS_09550 86.37041 127.95815 105.78744 71.50204 > > 15457 more rows ... > > > > $counts > > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 > MP.1 FM.1 FP.1 > > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 926 > 185 211 > > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 87 85 > > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 2 > 23 15 > > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 > 94 132 > > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 > 86 178 > > FA.1 FR.1 > > GS_14929 159 0 > > GS_09776 52 143 > > GS_18434 14 23 > > GS_08334 92 60 > > GS_09550 97 39 > > 15457 more rows ... > > > > $deviance > > GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 > > 19.059229 3.629342 9.509356 4.268291 8.809585 > > 15457 more elements ... > > > > $df.residual > > [1] 9 9 9 9 9 > > 15457 more elements ... > > > > $abundance > > [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > > 15457 more elements ... > > > > $design > > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM > > 1 0 0 0 0 0 0 0 > > 2 0 1 0 0 0 0 0 > > 3 1 0 0 0 0 0 0 > > 4 0 0 0 0 1 0 0 > > 5 0 0 0 0 0 1 0 > > regroupMP regroupMW > > 1 1 0 > > 2 0 0 > > 3 0 0 > > 4 0 0 > > 5 0 0 > > 13 more rows ... > > > > $offset > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] > > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > [,9] [,10] [,11] [,12] [,13] [,14] [,15] > [,16] > > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > [,17] [,18] > > [1,] 14.81434 14.83441 > > [2,] 14.81434 14.83441 > > [3,] 14.81434 14.83441 > > [4,] 14.81434 14.83441 > > [5,] 14.81434 14.83441 > > 15457 more rows ... > > > > $dispersion > > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > > 15457 more elements ... > > > > $method > > [1] "oneway" > > > > $samples > > group lib.size norm.factors > > MP MP 2220863 1 > > FA FA 2179157 1 > > FR FR 1181036 1 > > FW FW 1305802 1 > > MA MA 1780507 1 > > 13 more rows ... > > [[alternative HTML version deleted]] > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.1 years ago James W. MacDonald 65k

0

Entering edit mode

I wonder if there are different versions of userguide/reference manual. I still couldn't find what I could do to get useful output, like the pvalue for all the coefficients. The only pvalue I could see is in lrt output... ________________________________ From: James W. MacDonald <jmacdon@uw.edu> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Thursday, March 21, 2013 3:27 PM Subject: Re: [BioC] EdgeR: coefficients for GLM for multiple groups Hi Capricy, I'm not convinced you have read the user's guide carefully. There is, for example, a section called 'Quick start' that is only seven pages in, that shows the canonical glm analysis steps. Then there are two more worked examples starting on page 52 that show pretty much all the steps required to do a glm analysis. However, in the code you show below you haven't followed the examples, nor have you used the accessor function that is intended to produce useful output. Best, Jim On 3/21/2013 3:57 PM, capricy gao wrote: > I think I have carefully read both the userguide and reference manual, but couldn't figure it out. Could anybody help me out what arguments should be included? Thanks a lot. > > In limma fit results: there is something like this: > =============== > $p.value > Grp1 Grp2vs1 > Gene 1 0.8604469 6.019156e-05 > Gene 2 0.2174605 1.673262e-05 > Gene 3 0.3571957 5.758422e-02 > Gene 4 0.6789641 1.758690e-01 > Gene 5 0.4589329 8.223679e-01 > 95 more rows ... > ============== > > But in edgeR I couldn't find it as I posted previously.... > > > -------------------------------------------------------------------- ---- > *From:* James W. MacDonald <jmacdon@uw.edu> > *Cc:* "bioconductor@r-project.org" <bioconductor@r-project.org> > *Sent:* Thursday, March 21, 2013 1:45 PM > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > Hi Capricy, > > Just like limma, edgeR has a very comprehensive user's guide that you > can consult. > > library(edgeR) > edgeRUsersGuide() > > Best, > > Jim > > > > On 3/21/2013 2:41 PM, capricy gao wrote: > > I am looking at my fit results from GLM of edgeR for my data: > > > > and assume that $coefficients are related to the expression levels. Just wondering how I can get the p values for those coefficients since I remember in limma, each coefficient will be accompanied by the corresponding t value and p value. > > > > > > Thanks a lot for the help. > > > > capricy > > > > =================== > > > >> design=model.matrix(~0+regroup) > >> y<- estimateGLMCommonDisp(y,design) > >> y<- estimateGLMTagwiseDisp(y,design) > >> fit<- glmFit(y,design) > >> fit > > An object of class "DGEGLM" > > $coefficients > > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA > > GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 -9.260763 > > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 -10.798889 > > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 -11.654008 > > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 -10.836647 > > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 -9.722892 > > regroupMM regroupMP regroupMW > > GS_14929 -8.516256 -8.222225 -9.495394 > > GS_09776 -10.337434 -10.284703 -10.588544 > > GS_18434 -11.755905 -11.662276 -11.583950 > > GS_08334 -10.797529 -10.643409 -10.783104 > > GS_09550 -10.650495 -10.766462 -9.880359 > > 15457 more rows ... > > > > $fitted.values > > MP FA FR FW MA FW.1 MM > > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 513.34590 > > GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 84.69746 83.04882 > > GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 12.51946 20.07922 > > GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 103.20054 52.41066 > > GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 106.94567 60.71721 > > FP MA.1 MW FM MM.1 MW.1 MP.1 > > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 194.05252 568.23926 > > GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 65.01567 72.22056 > > GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 24.00644 18.19234 > > GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 53.51460 50.44331 > > GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 132.03679 44.59963 > > FM.1 FP.1 FA.1 FR.1 > > GS_14929 259.44483 324.86875 256.36216 198.47462 > > GS_09776 68.36129 74.42772 45.98959 92.72063 > > GS_18434 25.14609 21.03277 20.61808 33.30242 > > GS_08334 78.06556 97.00745 93.96021 57.08766 > > GS_09550 86.37041 127.95815 105.78744 71.50204 > > 15457 more rows ... > > > > $counts > > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 MP.1 FM.1 FP.1 > > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 926 185 211 > > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 87 85 > > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 2 23 15 > > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 94 132 > > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 86 178 > > FA.1 FR.1 > > GS_14929 159 0 > > GS_09776 52 143 > > GS_18434 14 23 > > GS_08334 92 60 > > GS_09550 97 39 > > 15457 more rows ... > > > > $deviance > > GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 > > 19.059229 3.629342 9.509356 4.268291 8.809585 > > 15457 more elements ... > > > > $df.residual > > [1] 9 9 9 9 9 > > 15457 more elements ... > > > > $abundance > > [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > > 15457 more elements ... > > > > $design > > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM > > 1 0 0 0 0 0 0 0 > > 2 0 1 0 0 0 0 0 > > 3 1 0 0 0 0 0 0 > > 4 0 0 0 0 1 0 0 > > 5 0 0 0 0 0 1 0 > > regroupMP regroupMW > > 1 1 0 > > 2 0 0 > > 3 0 0 > > 4 0 0 > > 5 0 0 > > 13 more rows ... > > > > $offset > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] > > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] > > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > [,17] [,18] > > [1,] 14.81434 14.83441 > > [2,] 14.81434 14.83441 > > [3,] 14.81434 14.83441 > > [4,] 14.81434 14.83441 > > [5,] 14.81434 14.83441 > > 15457 more rows ... > > > > $dispersion > > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > > 15457 more elements ... > > > > $method > > [1] "oneway" > > > > $samples > > group lib.size norm.factors > > MP MP 2220863 1 > > FA FA 2179157 1 > > FR FR 1181036 1 > > FW FW 1305802 1 > > MA MA 1780507 1 > > 13 more rows ... > > [[alternative HTML version deleted]] > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago capricy gao ▴ 230

0

Entering edit mode

I think you are fundamentally misunderstanding what you are doing. In the context of a glm, the way you test for the significance of a coefficient is by doing a likelihood ratio test, comparing a model that contains the coefficient to a reduced model that does not. So there won't be any p-values without making a comparison. In other words, the glmFit step sets up the model, and the glmLRT step does the test for whatever coefficient you care about. By default it tests all coefficients. From ?glmLRT: glmLRT conducts likelihood ratio tests for one or more coefficients in the linear model. If coef is used, the null hypothesis is that all the coefficients indicated by coef are equal to zero. If contrast is non-null, then the null hypothesis is that the specified contrast of the coefficients is equal to zero. For example, a contrast of c(0,1,-1), assuming there are three coefficients, would test the hypothesis that the second and third coefficients are equal. Best, Jim On 3/21/2013 4:55 PM, capricy gao wrote: > I wonder if there are different versions of userguide/reference manual. > > I still couldn't find what I could do to get useful output, like the > pvalue for all the coefficients. > > The only pvalue I could see is in lrt output... > > > -------------------------------------------------------------------- ---- > *From:* James W. MacDonald <jmacdon at="" uw.edu=""> > *To:* capricy gao <capricyg at="" yahoo.com=""> > *Cc:* "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > *Sent:* Thursday, March 21, 2013 3:27 PM > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > Hi Capricy, > > I'm not convinced you have read the user's guide carefully. There is, > for example, a section called 'Quick start' that is only seven pages > in, that shows the canonical glm analysis steps. Then there are two > more worked examples starting on page 52 that show pretty much all the > steps required to do a glm analysis. > > However, in the code you show below you haven't followed the examples, > nor have you used the accessor function that is intended to produce > useful output. > > Best, > > Jim > > > > On 3/21/2013 3:57 PM, capricy gao wrote: > > I think I have carefully read both the userguide and reference > manual, but couldn't figure it out. Could anybody help me out what > arguments should be included? Thanks a lot. > > > > In limma fit results: there is something like this: > > =============== > > $p.value > > Grp1 Grp2vs1 > > Gene 1 0.8604469 6.019156e-05 > > Gene 2 0.2174605 1.673262e-05 > > Gene 3 0.3571957 5.758422e-02 > > Gene 4 0.6789641 1.758690e-01 > > Gene 5 0.4589329 8.223679e-01 > > 95 more rows ... > > ============== > > > > But in edgeR I couldn't find it as I posted previously.... > > > > > > ------------------------------------------------------------------ ------ > > *From:* James W. MacDonald <jmacdon at="" uw.edu="" <mailto:jmacdon="" at="" uw.edu="">> > > *To:* capricy gao <capricyg at="" yahoo.com="" <mailto:capricyg="" at="" yahoo.com="">> > > *Cc:* "bioconductor at r-project.org > <mailto:bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">> > > *Sent:* Thursday, March 21, 2013 1:45 PM > > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > > > Hi Capricy, > > > > Just like limma, edgeR has a very comprehensive user's guide that you > > can consult. > > > > library(edgeR) > > edgeRUsersGuide() > > > > Best, > > > > Jim > > > > > > > > On 3/21/2013 2:41 PM, capricy gao wrote: > > > I am looking at my fit results from GLM of edgeR for my data: > > > > > > and assume that $coefficients are related to the expression > levels. Just wondering how I can get the p values for those > coefficients since I remember in limma, each coefficient will be > accompanied by the corresponding t value and p value. > > > > > > > > > Thanks a lot for the help. > > > > > > capricy > > > > > > =================== > > > > > >> design=model.matrix(~0+regroup) > > >> y<- estimateGLMCommonDisp(y,design) > > >> y<- estimateGLMTagwiseDisp(y,design) > > >> fit<- glmFit(y,design) > > >> fit > > > An object of class "DGEGLM" > > > $coefficients > > > regroupFR regroupFA regroupFM regroupFP regroupFW > regroupMA > > > GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 > -9.260763 > > > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 > -10.798889 > > > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 > -11.654008 > > > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 > -10.836647 > > > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 > -9.722892 > > > regroupMM regroupMP regroupMW > > > GS_14929 -8.516256 -8.222225 -9.495394 > > > GS_09776 -10.337434 -10.284703 -10.588544 > > > GS_18434 -11.755905 -11.662276 -11.583950 > > > GS_08334 -10.797529 -10.643409 -10.783104 > > > GS_09550 -10.650495 -10.766462 -9.880359 > > > 15457 more rows ... > > > > > > $fitted.values > > > MP FA FR FW MA > FW.1 MM > > > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 > 82.62258 513.34590 > > > GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 > 84.69746 83.04882 > > > GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 > 12.51946 20.07922 > > > GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 > 103.20054 52.41066 > > > GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 > 106.94567 60.71721 > > > FP MA.1 MW FM MM.1 > MW.1 MP.1 > > > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 > 194.05252 568.23926 > > > GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 > 65.01567 72.22056 > > > GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 > 24.00644 18.19234 > > > GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 > 53.51460 50.44331 > > > GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 > 132.03679 44.59963 > > > FM.1 FP.1 FA.1 FR.1 > > > GS_14929 259.44483 324.86875 256.36216 198.47462 > > > GS_09776 68.36129 74.42772 45.98959 92.72063 > > > GS_18434 25.14609 21.03277 20.61808 33.30242 > > > GS_08334 78.06556 97.00745 93.96021 57.08766 > > > GS_09550 86.37041 127.95815 105.78744 71.50204 > > > 15457 more rows ... > > > > > > $counts > > > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 > MP.1 FM.1 FP.1 > > > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 > 926 185 211 > > > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 > 87 85 > > > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 > 2 23 15 > > > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 > 94 132 > > > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 > 86 178 > > > FA.1 FR.1 > > > GS_14929 159 0 > > > GS_09776 52 143 > > > GS_18434 14 23 > > > GS_08334 92 60 > > > GS_09550 97 39 > > > 15457 more rows ... > > > > > > $deviance > > > GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 > > > 19.059229 3.629342 9.509356 4.268291 8.809585 > > > 15457 more elements ... > > > > > > $df.residual > > > [1] 9 9 9 9 9 > > > 15457 more elements ... > > > > > > $abundance > > > [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > > > 15457 more elements ... > > > > > > $design > > > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA > regroupMM > > > 1 0 0 0 0 0 0 0 > > > 2 0 1 0 0 0 0 0 > > > 3 1 0 0 0 0 0 0 > > > 4 0 0 0 0 1 0 0 > > > 5 0 0 0 0 0 1 0 > > > regroupMP regroupMW > > > 1 1 0 > > > 2 0 0 > > > 3 0 0 > > > 4 0 0 > > > 5 0 0 > > > 13 more rows ... > > > > > > $offset > > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] > > > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > [,9] [,10] [,11] [,12] [,13] [,14] [,15] > [,16] > > > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > [,17] [,18] > > > [1,] 14.81434 14.83441 > > > [2,] 14.81434 14.83441 > > > [3,] 14.81434 14.83441 > > > [4,] 14.81434 14.83441 > > > [5,] 14.81434 14.83441 > > > 15457 more rows ... > > > > > > $dispersion > > > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > > > 15457 more elements ... > > > > > > $method > > > [1] "oneway" > > > > > > $samples > > > group lib.size norm.factors > > > MP MP 2220863 1 > > > FA FA 2179157 1 > > > FR FR 1181036 1 > > > FW FW 1305802 1 > > > MA MA 1780507 1 > > > 13 more rows ... > > > [[alternative HTML version deleted]] > > > > > > > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > <mailto:bioconductor at="" r-project.org="" <mailto:bioconductor="" at="" r-project.org="">> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > > > > > > -- James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.1 years ago James W. MacDonald 65k

0

Entering edit mode

I am actually trying to get more information about the results from edgeR. I understand that glmLRT is doing the test, and so there is a pvalue for it. My question is if glmFit is a fitting step like other GLM fitting in R, do these fitted coefficients also have a pvalue? My expectation is that they should and I used limma example since that limFit has the corresponding information... Maybe I am wrong... ________________________________ From: James W. MacDonald <jmacdon@uw.edu> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Thursday, March 21, 2013 4:17 PM Subject: Re: [BioC] EdgeR: coefficients for GLM for multiple groups I think you are fundamentally misunderstanding what you are doing. In the context of a glm, the way you test for the significance of a coefficient is by doing a likelihood ratio test, comparing a model that contains the coefficient to a reduced model that does not. So there won't be any p-values without making a comparison. In other words, the glmFit step sets up the model, and the glmLRT step does the test for whatever coefficient you care about. By default it tests all coefficients. From ?glmLRT: glmLRT conducts likelihood ratio tests for one or more coefficients in the linear model. If coef is used, the null hypothesis is that all the coefficients indicated by coef are equal to zero. If contrast is non-null, then the null hypothesis is that the specified contrast of the coefficients is equal to zero. For example, a contrast of c(0,1,-1), assuming there are three coefficients, would test the hypothesis that the second and third coefficients are equal. Best, Jim On 3/21/2013 4:55 PM, capricy gao wrote: > I wonder if there are different versions of userguide/reference manual. > > I still couldn't find what I could do to get useful output, like the pvalue for all the coefficients. > > The only pvalue I could see is in lrt output... > > > -------------------------------------------------------------------- ---- > *From:* James W. MacDonald <jmacdon@uw.edu> > *Cc:* "bioconductor@r-project.org" <bioconductor@r-project.org> > *Sent:* Thursday, March 21, 2013 3:27 PM > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > Hi Capricy, > > I'm not convinced you have read the user's guide carefully. There is, for example, a section called 'Quick start' that is only seven pages in, that shows the canonical glm analysis steps. Then there are two more worked examples starting on page 52 that show pretty much all the steps required to do a glm analysis. > > However, in the code you show below you haven't followed the examples, nor have you used the accessor function that is intended to produce useful output. > > Best, > > Jim > > > > On 3/21/2013 3:57 PM, capricy gao wrote: > > I think I have carefully read both the userguide and reference manual, but couldn't figure it out. Could anybody help me out what arguments should be included? Thanks a lot. > > > > In limma fit results: there is something like this: > > =============== > > $p.value > > Grp1 Grp2vs1 > > Gene 1 0.8604469 6.019156e-05 > > Gene 2 0.2174605 1.673262e-05 > > Gene 3 0.3571957 5.758422e-02 > > Gene 4 0.6789641 1.758690e-01 > > Gene 5 0.4589329 8.223679e-01 > > 95 more rows ... > > ============== > > > > But in edgeR I couldn't find it as I posted previously.... > > > > > > ------------------------------------------------------------------ ------ > > *From:* James W. MacDonald <jmacdon@uw.edu <mailto:jmacdon@uw.edu="">> > > *Cc:* "bioconductor@r-project.org <mailto:bioconductor@r-project.org>" <bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> > > *Sent:* Thursday, March 21, 2013 1:45 PM > > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > > > Hi Capricy, > > > > Just like limma, edgeR has a very comprehensive user's guide that you > > can consult. > > > > library(edgeR) > > edgeRUsersGuide() > > > > Best, > > > > Jim > > > > > > > > On 3/21/2013 2:41 PM, capricy gao wrote: > > > I am looking at my fit results from GLM of edgeR for my data: > > > > > > and assume that $coefficients are related to the expression levels. Just wondering how I can get the p values for those coefficients since I remember in limma, each coefficient will be accompanied by the corresponding t value and p value. > > > > > > > > > Thanks a lot for the help. > > > > > > capricy > > > > > > =================== > > > > > >> design=model.matrix(~0+regroup) > > >> y<- estimateGLMCommonDisp(y,design) > > >> y<- estimateGLMTagwiseDisp(y,design) > > >> fit<- glmFit(y,design) > > >> fit > > > An object of class "DGEGLM" > > > $coefficients > > > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA > > > GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 -9.260763 > > > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 -10.798889 > > > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 -11.654008 > > > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 -10.836647 > > > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 -9.722892 > > > regroupMM regroupMP regroupMW > > > GS_14929 -8.516256 -8.222225 -9.495394 > > > GS_09776 -10.337434 -10.284703 -10.588544 > > > GS_18434 -11.755905 -11.662276 -11.583950 > > > GS_08334 -10.797529 -10.643409 -10.783104 > > > GS_09550 -10.650495 -10.766462 -9.880359 > > > 15457 more rows ... > > > > > > $fitted.values > > > MP FA FR FW MA FW.1 MM > > > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 513.34590 > > > GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 84.69746 83.04882 > > > GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 12.51946 20.07922 > > > GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 103.20054 52.41066 > > > GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 106.94567 60.71721 > > > FP MA.1 MW FM MM.1 MW.1 MP.1 > > > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 194.05252 568.23926 > > > GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 65.01567 72.22056 > > > GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 24.00644 18.19234 > > > GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 53.51460 50.44331 > > > GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 132.03679 44.59963 > > > FM.1 FP.1 FA.1 FR.1 > > > GS_14929 259.44483 324.86875 256.36216 198.47462 > > > GS_09776 68.36129 74.42772 45.98959 92.72063 > > > GS_18434 25.14609 21.03277 20.61808 33.30242 > > > GS_08334 78.06556 97.00745 93.96021 57.08766 > > > GS_09550 86.37041 127.95815 105.78744 71.50204 > > > 15457 more rows ... > > > > > > $counts > > > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 MP.1 FM.1 FP.1 > > > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 926 185 211 > > > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 87 85 > > > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 2 23 15 > > > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 94 132 > > > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 86 178 > > > FA.1 FR.1 > > > GS_14929 159 0 > > > GS_09776 52 143 > > > GS_18434 14 23 > > > GS_08334 92 60 > > > GS_09550 97 39 > > > 15457 more rows ... > > > > > > $deviance > > > GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 > > > 19.059229 3.629342 9.509356 4.268291 8.809585 > > > 15457 more elements ... > > > > > > $df.residual > > > [1] 9 9 9 9 9 > > > 15457 more elements ... > > > > > > $abundance > > > [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > > > 15457 more elements ... > > > > > > $design > > > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM > > > 1 0 0 0 0 0 0 0 > > > 2 0 1 0 0 0 0 0 > > > 3 1 0 0 0 0 0 0 > > > 4 0 0 0 0 1 0 0 > > > 5 0 0 0 0 0 1 0 > > > regroupMP regroupMW > > > 1 1 0 > > > 2 0 0 > > > 3 0 0 > > > 4 0 0 > > > 5 0 0 > > > 13 more rows ... > > > > > > $offset > > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] > > > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 > > > [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] > > > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 > > > [,17] [,18] > > > [1,] 14.81434 14.83441 > > > [2,] 14.81434 14.83441 > > > [3,] 14.81434 14.83441 > > > [4,] 14.81434 14.83441 > > > [5,] 14.81434 14.83441 > > > 15457 more rows ... > > > > > > $dispersion > > > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > > > 15457 more elements ... > > > > > > $method > > > [1] "oneway" > > > > > > $samples > > > group lib.size norm.factors > > > MP MP 2220863 1 > > > FA FA 2179157 1 > > > FR FR 1181036 1 > > > FW FW 1305802 1 > > > MA MA 1780507 1 > > > 13 more rows ... > > > [[alternative HTML version deleted]] > > > > > > > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> <mailto:bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > > > > > > -- James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago capricy gao ▴ 230

0

Entering edit mode

Hi capricy, as far as I now, limma output and edgeR do behave differentially in what you stated, that limma also calculates a moderated F-value similarly as in standard linear models. But i assume (not 100% shure about it) that when using a GLM, conducting a F test is not possible because your residuals are not normally distributed. But in "A comparison of methods for differential expression analysis of RNA-seq data"(BMC bioinformatics) the authors state (and I think I found a talk in the net where one of the edgeR authors was stating something similar) that using limma for rna seq data (described in the latest limma user manual) performs very well on RNA Seq Data. So maybe you are fine with limma anyways. Best regards, Moritz 2013/3/22 capricy gao <capricyg@yahoo.com> > I am actually trying to get more information about the results from edgeR. > I understand that glmLRT is doing the test, and so there is a pvalue for > it. My question is if glmFit is a fitting step like other GLM fitting in R, > do these fitted coefficients also have a pvalue? My expectation is that > they should and I used limma example since that limFit has the > corresponding information... > > Maybe I am wrong... > > > > > > > ________________________________ > From: James W. MacDonald <jmacdon@uw.edu> > > Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> > Sent: Thursday, March 21, 2013 4:17 PM > Subject: Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > I think you are fundamentally misunderstanding what you are doing. In the > context of a glm, the way you test for the significance of a coefficient is > by doing a likelihood ratio test, comparing a model that contains the > coefficient to a reduced model that does not. So there won't be any > p-values without making a comparison. > > In other words, the glmFit step sets up the model, and the glmLRT step > does the test for whatever coefficient you care about. By default it tests > all coefficients. From ?glmLRT: > > glmLRT conducts likelihood ratio tests for one or more > coefficients in the linear model. If coef is used, the null > hypothesis is that all the coefficients indicated by coef are > equal to zero. If contrast is non-null, then the null hypothesis > is that the specified contrast of the coefficients is equal to > zero. For example, a contrast of c(0,1,-1), assuming there are > three coefficients, would test the hypothesis that the second and > third coefficients are equal. > > > Best, > > Jim > > > > On 3/21/2013 4:55 PM, capricy gao wrote: > > I wonder if there are different versions of userguide/reference manual. > > > > I still couldn't find what I could do to get useful output, like the > pvalue for all the coefficients. > > > > The only pvalue I could see is in lrt output... > > > > > > ------------------------------------------------------------------ ------ > > *From:* James W. MacDonald <jmacdon@uw.edu> > > > *Cc:* "bioconductor@r-project.org" <bioconductor@r-project.org> > > *Sent:* Thursday, March 21, 2013 3:27 PM > > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > > > Hi Capricy, > > > > I'm not convinced you have read the user's guide carefully. There is, > for example, a section called 'Quick start' that is only seven pages in, > that shows the canonical glm analysis steps. Then there are two more worked > examples starting on page 52 that show pretty much all the steps required > to do a glm analysis. > > > > However, in the code you show below you haven't followed the examples, > nor have you used the accessor function that is intended to produce useful > output. > > > > Best, > > > > Jim > > > > > > > > On 3/21/2013 3:57 PM, capricy gao wrote: > > > I think I have carefully read both the userguide and reference manual, > but couldn't figure it out. Could anybody help me out what arguments should > be included? Thanks a lot. > > > > > > In limma fit results: there is something like this: > > > =============== > > > $p.value > > > Grp1 Grp2vs1 > > > Gene 1 0.8604469 6.019156e-05 > > > Gene 2 0.2174605 1.673262e-05 > > > Gene 3 0.3571957 5.758422e-02 > > > Gene 4 0.6789641 1.758690e-01 > > > Gene 5 0.4589329 8.223679e-01 > > > 95 more rows ... > > > ============== > > > > > > But in edgeR I couldn't find it as I posted previously.... > > > > > > > > > > -------------------------------------------------------------------- ---- > > > *From:* James W. MacDonald <jmacdon@uw.edu <mailto:jmacdon@uw.edu="">> > > > > *Cc:* "bioconductor@r-project.org <mailto:bioconductor@r-project.org>" > <bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> > > > *Sent:* Thursday, March 21, 2013 1:45 PM > > > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > > > > > Hi Capricy, > > > > > > Just like limma, edgeR has a very comprehensive user's guide that you > > > can consult. > > > > > > library(edgeR) > > > edgeRUsersGuide() > > > > > > Best, > > > > > > Jim > > > > > > > > > > > > On 3/21/2013 2:41 PM, capricy gao wrote: > > > > I am looking at my fit results from GLM of edgeR for my data: > > > > > > > > and assume that $coefficients are related to the expression levels. > Just wondering how I can get the p values for those coefficients since I > remember in limma, each coefficient will be accompanied by the > corresponding t value and p value. > > > > > > > > > > > > Thanks a lot for the help. > > > > > > > > capricy > > > > > > > > =================== > > > > > > > >> design=model.matrix(~0+regroup) > > > >> y<- estimateGLMCommonDisp(y,design) > > > >> y<- estimateGLMTagwiseDisp(y,design) > > > >> fit<- glmFit(y,design) > > > >> fit > > > > An object of class "DGEGLM" > > > > $coefficients > > > > regroupFR regroupFA regroupFM regroupFP regroupFW > regroupMA > > > > GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 > -9.260763 > > > > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 > -10.798889 > > > > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 > -11.654008 > > > > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 > -10.836647 > > > > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 > -9.722892 > > > > regroupMM regroupMP regroupMW > > > > GS_14929 -8.516256 -8.222225 -9.495394 > > > > GS_09776 -10.337434 -10.284703 -10.588544 > > > > GS_18434 -11.755905 -11.662276 -11.583950 > > > > GS_08334 -10.797529 -10.643409 -10.783104 > > > > GS_09550 -10.650495 -10.766462 -9.880359 > > > > 15457 more rows ... > > > > > > > > $fitted.values > > > > MP FA FR FW MA FW.1 > MM > > > > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 > 513.34590 > > > > GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 84.69746 > 83.04882 > > > > GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 12.51946 > 20.07922 > > > > GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 103.20054 > 52.41066 > > > > GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 106.94567 > 60.71721 > > > > FP MA.1 MW FM MM.1 > MW.1 MP.1 > > > > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 > 194.05252 568.23926 > > > > GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 > 65.01567 72.22056 > > > > GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 > 24.00644 18.19234 > > > > GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 > 53.51460 50.44331 > > > > GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 > 132.03679 44.59963 > > > > FM.1 FP.1 FA.1 FR.1 > > > > GS_14929 259.44483 324.86875 256.36216 198.47462 > > > > GS_09776 68.36129 74.42772 45.98959 92.72063 > > > > GS_18434 25.14609 21.03277 20.61808 33.30242 > > > > GS_08334 78.06556 97.00745 93.96021 57.08766 > > > > GS_09550 86.37041 127.95815 105.78744 71.50204 > > > > 15457 more rows ... > > > > > > > > $counts > > > > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 > MP.1 FM.1 FP.1 > > > > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 926 > 185 211 > > > > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 87 > 85 > > > > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 2 > 23 15 > > > > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 > 94 132 > > > > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 > 86 178 > > > > FA.1 FR.1 > > > > GS_14929 159 0 > > > > GS_09776 52 143 > > > > GS_18434 14 23 > > > > GS_08334 92 60 > > > > GS_09550 97 39 > > > > 15457 more rows ... > > > > > > > > $deviance > > > > GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 > > > > 19.059229 3.629342 9.509356 4.268291 8.809585 > > > > 15457 more elements ... > > > > > > > > $df.residual > > > > [1] 9 9 9 9 9 > > > > 15457 more elements ... > > > > > > > > $abundance > > > > [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 > > > > 15457 more elements ... > > > > > > > > $design > > > > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA > regroupMM > > > > 1 0 0 0 0 0 0 0 > > > > 2 0 1 0 0 0 0 0 > > > > 3 1 0 0 0 0 0 0 > > > > 4 0 0 0 0 1 0 0 > > > > 5 0 0 0 0 0 1 0 > > > > regroupMP regroupMW > > > > 1 1 0 > > > > 2 0 0 > > > > 3 0 0 > > > > 4 0 0 > > > > 5 0 0 > > > > 13 more rows ... > > > > > > > > $offset > > > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] > > > > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 > 13.52703 > > > > [,9] [,10] [,11] [,12] [,13] [,14] [,15] > [,16] > > > > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 > 14.95085 > > > > [,17] [,18] > > > > [1,] 14.81434 14.83441 > > > > [2,] 14.81434 14.83441 > > > > [3,] 14.81434 14.83441 > > > > [4,] 14.81434 14.83441 > > > > [5,] 14.81434 14.83441 > > > > 15457 more rows ... > > > > > > > > $dispersion > > > > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 > > > > 15457 more elements ... > > > > > > > > $method > > > > [1] "oneway" > > > > > > > > $samples > > > > group lib.size norm.factors > > > > MP MP 2220863 1 > > > > FA FA 2179157 1 > > > > FR FR 1181036 1 > > > > FW FW 1305802 1 > > > > MA MA 1780507 1 > > > > 13 more rows ... > > > > [[alternative HTML version deleted]] > > > > > > > > > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > <mailto:bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- James W. MacDonald, M.S. > > > Biostatistician > > > University of Washington > > > Environmental and Occupational Health Sciences > > > 4225 Roosevelt Way NE, # 100 > > > Seattle WA 98105-6099 > > > > > > > > > > > > > -- James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > > > > > > -- James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *Moritz Heß PhD Candidate * *Research associate Forest Research Institute of Baden Württemberg (FVA) Wonnhalde 4 79100 Freiburg (Germany) phone +49 761 4018 301* [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago Moritz Hess ▴ 60

0

Entering edit mode

Hi, Moritz, Thank you very much for clarification and the article you recommended. I originally raised this question actually because I was trying to do gene cluster analysis based on expression levels. The edgeR user guide said: in the following design, design=model.matrix(~0+regroup), coefficient will define the expression level of each group. I figured that if I really want to use the coefficient under such circumstances, I should filter all the coefficients with low confidence. So I was looking for the way to output the p values for those coefficients. Well, it looks like now I just take all as they are... ________________________________ From: Moritz Hess <ssehztirom@googlemail.com> Cc: James W. MacDonald <jmacdon@uw.edu>; "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Monday, March 25, 2013 8:35 AM Subject: Re: [BioC] EdgeR: coefficients for GLM for multiple groups Hi capricy, as far as I now, limma output and edgeR do behave differentially in what you stated, that limma also calculates a moderated F-value similarly as in standard linear models. But i assume (not 100% shure about it) that when using a GLM, conducting a F test is not possible because your residuals are not normally distributed. But in "A comparison of methods for differential expression analysis of RNA-seq data"(BMC bioinformatics) the authors state (and I think I found a talk in the net where one of the edgeR authors was stating something similar) that using limma for rna seq data (described in the latest limma user manual) performs very well on RNA Seq Data. So maybe you are fine with limma anyways. Best regards, Moritz I am actually trying to get more information about the results from edgeR. I understand that glmLRT is doing the test, and so there is a pvalue for it. My question is if glmFit is a fitting step like other GLM fitting in R, do these fitted coefficients also have a pvalue? My expectation is that they should and I used limma example since that limFit has the corresponding information... > >Maybe I am wrong... > > > > > > > >________________________________ > From: James W. MacDonald <jmacdon@uw.edu> > >Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> >Sent: Thursday, March 21, 2013 4:17 PM >Subject: Re: [BioC] EdgeR: coefficients for GLM for multiple groups > > >I think you are fundamentally misunderstanding what you are doing. In the context of a glm, the way you test for the significance of a coefficient is by doing a likelihood ratio test, comparing a model that contains the coefficient to a reduced model that does not. So there won't be any p-values without making a comparison. > >In other words, the glmFit step sets up the model, and the glmLRT step does the test for whatever coefficient you care about. By default it tests all coefficients. From ?glmLRT: > > glmLRT conducts likelihood ratio tests for one or more > coefficients in the linear model. If coef is used, the null > hypothesis is that all the coefficients indicated by coef are > equal to zero. If contrast is non-null, then the null hypothesis > is that the specified contrast of the coefficients is equal to > zero. For example, a contrast of c(0,1,-1), assuming there are > three coefficients, would test the hypothesis that the second and > third coefficients are equal. > > >Best, > >Jim > > > >On 3/21/2013 4:55 PM, capricy gao wrote: >> I wonder if there are different versions of userguide/reference manual. >> >> I still couldn't find what I could do to get useful output, like the pvalue for all the coefficients. >> >> The only pvalue I could see is in lrt output... >> >> >> ------------------------------------------------------------------- ----- >> *From:* James W. MacDonald <jmacdon@uw.edu> > > >> *Cc:* "bioconductor@r-project.org" <bioconductor@r-project.org> >> *Sent:* Thursday, March 21, 2013 3:27 PM >> *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups >> >> Hi Capricy, >> >> I'm not convinced you have read the user's guide carefully. There is, for example, a section called 'Quick start' that is only seven pages in, that shows the canonical glm analysis steps. Then there are two more worked examples starting on page 52 that show pretty much all the steps required to do a glm analysis. >> >> However, in the code you show below you haven't followed the examples, nor have you used the accessor function that is intended to produce useful output. >> >> Best, >> >> Jim >> >> >> >> On 3/21/2013 3:57 PM, capricy gao wrote: >> > I think I have carefully read both the userguide and reference manual, but couldn't figure it out. Could anybody help me out what arguments should be included? Thanks a lot. >> > >> > In limma fit results: there is something like this: >> > =============== >> > $p.value >> > Grp1 Grp2vs1 >> > Gene 1 0.8604469 6.019156e-05 >> > Gene 2 0.2174605 1.673262e-05 >> > Gene 3 0.3571957 5.758422e-02 >> > Gene 4 0.6789641 1.758690e-01 >> > Gene 5 0.4589329 8.223679e-01 >> > 95 more rows ... >> > ============== >> > >> > But in edgeR I couldn't find it as I posted previously.... >> > >> > >> > ----------------------------------------------------------------- ------- >> > *From:* James W. MacDonald <jmacdon@uw.edu <mailto:jmacdon@uw.edu="">> > > >> > *Cc:* "bioconductor@r-project.org <mailto:bioconductor@r-project.org>" <bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> >> > *Sent:* Thursday, March 21, 2013 1:45 PM >> > *Subject:* Re: [BioC] EdgeR: coefficients for GLM for multiple groups >> > >> > Hi Capricy, >> > >> > Just like limma, edgeR has a very comprehensive user's guide that you >> > can consult. >> > >> > library(edgeR) >> > edgeRUsersGuide() >> > >> > Best, >> > >> > Jim >> > >> > >> > >> > On 3/21/2013 2:41 PM, capricy gao wrote: >> > > I am looking at my fit results from GLM of edgeR for my data: >> > > >> > > and assume that $coefficients are related to the expression levels. Just wondering how I can get the p values for those coefficients since I remember in limma, each coefficient will be accompanied by the corresponding t value and p value. >> > > >> > > >> > > Thanks a lot for the help. >> > > >> > > capricy >> > > >> > > =================== >> > > >> > >> design=model.matrix(~0+regroup) >> > >> y<- estimateGLMCommonDisp(y,design) >> > >> y<- estimateGLMTagwiseDisp(y,design) >> > >> fit<- glmFit(y,design) >> > >> fit >> > > An object of class "DGEGLM" >> > > $coefficients >> > > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA >> > > GS_14929 -9.543565 -9.267612 -9.310823 -9.167299 -10.62094 -9.260763 >> > > GS_09776 -10.304430 -10.985146 -10.644154 -10.640469 -10.59615 -10.798889 >> > > GS_18434 -11.327664 -11.786421 -11.643292 -11.902728 -12.50470 -11.654008 >> > > GS_08334 -10.789181 -10.271089 -10.511480 -10.375642 -10.39865 -10.836647 >> > > GS_09550 -10.564167 -10.152571 -10.410428 -10.098825 -10.36302 -9.722892 >> > > regroupMM regroupMP regroupMW >> > > GS_14929 -8.516256 -8.222225 -9.495394 >> > > GS_09776 -10.337434 -10.284703 -10.588544 >> > > GS_18434 -11.755905 -11.662276 -11.583950 >> > > GS_08334 -10.797529 -10.643409 -10.783104 >> > > GS_09550 -10.650495 -10.766462 -9.880359 >> > > 15457 more rows ... >> > > >> > > $fitted.values >> > > MP FA FR FW MA FW.1 MM >> > > GS_14929 596.53153 205.75696 84.61834 31.844025 169.27178 82.62258 513.34590 >> > > GS_09776 75.81638 36.91137 39.53083 32.643715 36.33823 84.69746 83.04882 >> > > GS_18434 19.09812 16.54813 14.19827 4.825192 15.43876 12.51946 20.07922 >> > > GS_08334 52.95485 75.41272 24.33895 39.775089 34.99088 103.20054 52.41066 >> > > GS_09550 46.82022 84.90529 30.48442 41.218519 106.62251 106.94567 60.71721 >> > > FP MA.1 MW FM MM.1 MW.1 MP.1 >> > > GS_14929 78.226172 123.52247 193.95899 122.61523 173.571103 194.05252 568.23926 >> > > GS_09776 17.921685 26.51705 64.98434 32.30797 28.080239 65.01567 72.22056 >> > > GS_18434 5.064548 11.26610 23.99487 11.88420 6.789129 24.00644 18.19234 >> > > GS_08334 23.358730 25.53385 53.48880 36.89427 17.720948 53.51460 50.44331 >> > > GS_09550 30.811447 77.80550 131.97315 40.81919 20.529537 132.03679 44.59963 >> > > FM.1 FP.1 FA.1 FR.1 >> > > GS_14929 259.44483 324.86875 256.36216 198.47462 >> > > GS_09776 68.36129 74.42772 45.98959 92.72063 >> > > GS_18434 25.14609 21.03277 20.61808 33.30242 >> > > GS_08334 78.06556 97.00745 93.96021 57.08766 >> > > GS_09550 86.37041 127.95815 105.78744 71.50204 >> > > 15457 more rows ... >> > > >> > > $counts >> > > MP FA FR FW MA FW.1 MM FP MA.1 MW FM MM.1 MW.1 MP.1 FM.1 FP.1 >> > > GS_14929 221 284 170 23 267 105 209 106 52 218 158 277 170 926 185 211 >> > > GS_09776 75 32 17 28 41 96 86 15 23 65 23 27 65 73 87 85 >> > > GS_18434 36 22 19 8 21 6 22 7 7 27 13 6 21 2 23 15 >> > > GS_08334 44 77 23 50 41 78 78 14 21 61 29 8 46 59 94 132 >> > > GS_09550 82 92 45 54 105 75 95 18 79 153 41 8 111 11 86 178 >> > > FA.1 FR.1 >> > > GS_14929 159 0 >> > > GS_09776 52 143 >> > > GS_18434 14 23 >> > > GS_08334 92 60 >> > > GS_09550 97 39 >> > > 15457 more rows ... >> > > >> > > $deviance >> > > GS_14929 GS_09776 GS_18434 GS_08334 GS_09550 >> > > 19.059229 3.629342 9.509356 4.268291 8.809585 >> > > 15457 more elements ... >> > > >> > > $df.residual >> > > [1] 9 9 9 9 9 >> > > 15457 more elements ... >> > > >> > > $abundance >> > > [1] -9.081235 -10.552328 -11.717230 -10.579804 -10.233947 >> > > 15457 more elements ... >> > > >> > > $design >> > > regroupFR regroupFA regroupFM regroupFP regroupFW regroupMA regroupMM >> > > 1 0 0 0 0 0 0 0 >> > > 2 0 1 0 0 0 0 0 >> > > 3 1 0 0 0 0 0 0 >> > > 4 0 0 0 0 1 0 0 >> > > 5 0 0 0 0 0 1 0 >> > > regroupMP regroupMW >> > > 1 1 0 >> > > 2 0 0 >> > > 3 0 0 >> > > 4 0 0 >> > > 5 0 0 >> > > 13 more rows ... >> > > >> > > $offset >> > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] >> > > [1,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 >> > > [2,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 >> > > [3,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 >> > > [4,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 >> > > [5,] 14.61341 14.59445 13.9819 14.08233 14.39241 15.03576 14.75727 13.52703 >> > > [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] >> > > [1,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 >> > > [2,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 >> > > [3,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 >> > > [4,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 >> > > [5,] 14.07733 14.76322 14.12002 13.67291 14.7637 14.56482 14.86951 14.95085 >> > > [,17] [,18] >> > > [1,] 14.81434 14.83441 >> > > [2,] 14.81434 14.83441 >> > > [3,] 14.81434 14.83441 >> > > [4,] 14.81434 14.83441 >> > > [5,] 14.81434 14.83441 >> > > 15457 more rows ... >> > > >> > > $dispersion >> > > [1] 0.7464955 0.2734189 0.4034924 0.2832934 0.3819788 >> > > 15457 more elements ... >> > > >> > > $method >> > > [1] "oneway" >> > > >> > > $samples >> > > group lib.size norm.factors >> > > MP MP 2220863 1 >> > > FA FA 2179157 1 >> > > FR FR 1181036 1 >> > > FW FW 1305802 1 >> > > MA MA 1780507 1 >> > > 13 more rows ... >> > > [[alternative HTML version deleted]] >> > > >> > > >> > > >> > > _______________________________________________ >> > > Bioconductor mailing list >> > > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> <mailto:bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> >> > > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > -- James W. MacDonald, M.S. >> > Biostatistician >> > University of Washington >> > Environmental and Occupational Health Sciences >> > 4225 Roosevelt Way NE, # 100 >> > Seattle WA 98105-6099 >> > >> > >> > >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> > >-- James W. MacDonald, M.S. >Biostatistician >University of Washington >Environmental and Occupational Health Sciences >4225 Roosevelt Way NE, # 100 >Seattle WA 98105-6099 > [[alternative HTML version deleted]] > > >_______________________________________________ >Bioconductor mailing list >Bioconductor@r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Moritz Heß PhD Candidate Research associate Forest Research Institute of Baden Württemberg (FVA) Wonnhalde 4 79100 Freiburg (Germany) phone +49 761 4018 301 [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago capricy gao ▴ 230

0

Entering edit mode

Hi Simon, Thanks for the suggestions. I'll see about implementing them soon and let you know if I have any problems. -Ryan On Mar 21, 2013 6:44 AM, "Simon Anders" <anders@embl.de <mailto:anders@embl.de="">> wrote: Hi Ryan On 21/03/13 01:05, Ryan C. Thompson wrote: So, we have several great packages for testing for evidence of differential expression using RNA-seq counts, and I am happily using those. However, I would like to know if there is any method for testing for evidence of equivalent expression. That is, I would like to show with some level of confidence that treatment X has no (or negligible) effect on gene Y. Is there any way to conduct such a test? One of the new features of DESeq2 is that it provides a empirical-Bayes style shrinkage estimates of coefficients (i.e., log fold changes) and also estimates standard errors for these coefficients (taken from the the reciprocal curvature of the posterior). Your task is one of the application we had in mind for this. You want to know whether the fold change is zero or nearly zero, and you also want to be ascertained that this estimate of a nearly- zero fold change is a precise one. So, to find genes whose absolute log fold change is _reliably_ smaller than some threshold, take all those genes for which the estimated abs log fold change is below the threshold and the standard error is well below the threshold, too. The reason, why I write "below a threshold" rather than "equal to zero" is that it's biologically unreasonable to assume that a treatment has exactly zero effect on a gene's expression. Gene regulation is such an interconnected network that, at least in my view, every gene will react ever so slightly to every perturbance, but often, this reaction is too small to be measurable. This is why you should define a threshold for "biologically unlikely to be relevant". Let's say, we don't believe that an expression change of less than 15% (0.2 on a log2 scale) is worth bothering. So, if you might take all genes with abs log2 fold change below 0.2, and furthermore require that the standard error of this log2 fold change estimate is below, say, 0.05, than you will get gene with true values well below at least 0.3 or so. In the end, the thresholds won't matter too much but if you want real p values, this is possible, too: Simply divide the log2 fold change estimates by their standard errors to get z scores (this works because the sampling distribution of our fold change estimates is reasonably close to normal), and do two one-sided tests (TOST) as suggested by Ryan above. Actually, as this may come up more often, we should maybe explain the details of using this approach in the vignette. Ask again if you want to see some code. Simon _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org <mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Ok, after poking around a bit, I'm not sure how to do this. Could you provide an example of computing p-values for the case where H0 = abs(coefficient) > epsilon H1 = abs(coefficient) <= epsilon I got as far as computing the z-scores, but I'm not sure how to go from there to p-values for a test of equivalence. Thanks, -Ryan On 03/21/2013 06:42 AM, Simon Anders wrote: > Hi Ryan > > On 21/03/13 01:05, Ryan C. Thompson wrote: >> So, we have several great packages for testing for evidence of >> differential expression using RNA-seq counts, and I am happily using >> those. However, I would like to know if there is any method for testing >> for evidence of equivalent expression. That is, I would like to show >> with some level of confidence that treatment X has no (or negligible) >> effect on gene Y. Is there any way to conduct such a test? > > One of the new features of DESeq2 is that it provides a > empirical-Bayes style shrinkage estimates of coefficients (i.e., log > fold changes) and also estimates standard errors for these > coefficients (taken from the the reciprocal curvature of the > posterior). Your task is one of the application we had in mind for this. > > You want to know whether the fold change is zero or nearly zero, and > you also want to be ascertained that this estimate of a nearly-zero > fold change is a precise one. So, to find genes whose absolute log > fold change is _reliably_ smaller than some threshold, take all those > genes for which the estimated abs log fold change is below the > threshold and the standard error is well below the threshold, too. > > The reason, why I write "below a threshold" rather than "equal to > zero" is that it's biologically unreasonable to assume that a > treatment has exactly zero effect on a gene's expression. Gene > regulation is such an interconnected network that, at least in my > view, every gene will react ever so slightly to every perturbance, but > often, this reaction is too small to be measurable. This is why you > should define a threshold for "biologically unlikely to be relevant". > Let's say, we don't believe that an expression change of less than 15% > (0.2 on a log2 scale) is worth bothering. So, if you might take all > genes with abs log2 fold change below 0.2, and furthermore require > that the standard error of this log2 fold change estimate is below, > say, 0.05, than you will get gene with true values well below at least > 0.3 or so. > > In the end, the thresholds won't matter too much but if you want real > p values, this is possible, too: Simply divide the log2 fold change > estimates by their standard errors to get z scores (this works because > the sampling distribution of our fold change estimates is reasonably > close to normal), and do two one-sided tests (TOST) as suggested by > Ryan above. > > Actually, as this may come up more often, we should maybe explain the > details of using this approach in the vignette. Ask again if you want > to see some code. > > Simon > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.1 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Hi Ryan On 22/03/13 00:11, Ryan C. Thompson wrote: > I got as far as computing the z-scores, but I'm not sure how to go from > there to p-values for a test of equivalence. I guess I made it sound more trivial than it is. The "two one-sided tests" (TOST) scheme already mentioned in this thread is originally due to Schuirmann (Journal of Pharmacokinetics and Pharmacodynamics, 1987) -- or rather, the acronym is due to him, the idea is probably older, and he just pointed out that this what one should do in situations such as yours. To recap, you want to test the null hypothesis that your log fold change beta is larger than some threshold theta, i.e., you want to find genes for which you can reject this null hypothesis and hence say with some certainty that |beta| < theta. The TOST scheme suggests, as its name implies to perform two pone sided tests, one with the null hypothesis H0_A: beta > theta, the other with H0_B: beta < -theta, and then use the larger of the two p values. I demonstrate this below with some code. # Let's use the example data from the pasilla package library( DESeq2 ) library( pasilla ) data( "pasillaGenes" ) # Create a DESeq2 data object from the pasilla data dse <- DESeqSummarizedExperimentFromMatrix( counts(pasillaGenes), pData(pasillaGenes)[,c("condition","type")], ~ type + condition ) # Perform a standard DESeq2 analysis dse <- DESeq(dse) # The log2 fold changes are found here beta <- results(dse)$log2FoldChange # Just to make the following clearer, I should point out that the # "results" accessor is just a short-cut for this access to the rowData: all( beta == mcols(rowData(dse))$conditionuntreated, na.rm=TRUE ) # (returns TRUE) # The log fold change estimates all come with standard error information # which we find in the rowData (maybe we should copy this to the # 'results', too) betaSE <- mcols(rowData(dse))$SE_conditionuntreated # Internally, the Wald test is implemented as a simple two-sided # z test of beta/betaSE. Two demonstrate this, we to the test # manually and compare pvalDE <- 2 * pnorm( abs( beta ), sd = betaSE, lower.tail=FALSE ) all( abs( pvalDE - results(dse)$pvalue ) < 1e-15, na.rm=TRUE ) # (returns TRUE) # This was the test for DE, of course, i.e., small pvalDE means that # the gene's expression change (the true value of beta) is not zero # What we want is the opposite, namely find gene, for which abs(beta) # is smaller than some threshold, theta theta <- .3 # So, we do our two one-sided tests. For a one-sided z test, we # simply use tail probabilities from the normal distribution. # First, the test of H0_A: true_beta > thr pA <- pnorm( beta, thr, betaSE, lower.tail=TRUE ) # Next, the test of H0_B: true_beta < -thr pB <- pnorm( beta, -thr, betaSE, lower.tail=FALSE ) # The overall p value is the maximum, because we want to reject H0_A # and H0_B simultaneously pvalTOST <- pmax( pA, pB ) # Let's adjust our two p values with BH: sigDE <- p.adjust( pvalDE, "BH" ) < .1 sigSmall <- p.adjust( pvalTOST, "BH" ) < .1 # And make an MA plot, with sigDE in red and sigSmall in green plot( rowMeans( counts(dse,normalized=TRUE) ), beta, log="x", pch=20, cex=.2, col = 1 + sigDE + 2*sigSmall ) # Plot is attached. I think, we will include this code in one of the coming releases of DESeq2. Simon -------------- next part -------------- A non-text attachment was scrubbed... Name: Rplot001.png Type: image/png Size: 46596 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130322="" ed095934="" attachment.png="">

ADD REPLY • link 11.1 years ago Simon Anders ★ 3.7k

0

Entering edit mode

Thanks, Simon, that makes it very clear. On 03/22/2013 05:08 AM, Simon Anders wrote: > Hi Ryan > > On 22/03/13 00:11, Ryan C. Thompson wrote: >> I got as far as computing the z-scores, but I'm not sure how to go from >> there to p-values for a test of equivalence. > > I guess I made it sound more trivial than it is. The "two one-sided > tests" (TOST) scheme already mentioned in this thread is originally > due to Schuirmann (Journal of Pharmacokinetics and Pharmacodynamics, > 1987) -- or rather, the acronym is due to him, the idea is probably > older, and he just pointed out that this what one should do in > situations such as yours. > > To recap, you want to test the null hypothesis that your log fold > change beta is larger than some threshold theta, i.e., you want to > find genes for which you can reject this null hypothesis and hence say > with some certainty that |beta| < theta. The TOST scheme suggests, as > its name implies to perform two pone sided tests, one with the null > hypothesis H0_A: beta > theta, the other with H0_B: beta < -theta, and > then use the larger of the two p values. > > I demonstrate this below with some code. > > # Let's use the example data from the pasilla package > library( DESeq2 ) > library( pasilla ) > data( "pasillaGenes" ) > > # Create a DESeq2 data object from the pasilla data > dse <- DESeqSummarizedExperimentFromMatrix( > counts(pasillaGenes), pData(pasillaGenes)[,c("condition","type")], > ~ type + condition ) > > # Perform a standard DESeq2 analysis > dse <- DESeq(dse) > > # The log2 fold changes are found here > beta <- results(dse)$log2FoldChange > > # Just to make the following clearer, I should point out that the > # "results" accessor is just a short-cut for this access to the rowData: > all( beta == mcols(rowData(dse))$conditionuntreated, na.rm=TRUE ) > # (returns TRUE) > > # The log fold change estimates all come with standard error information > # which we find in the rowData (maybe we should copy this to the > # 'results', too) > betaSE <- mcols(rowData(dse))$SE_conditionuntreated > > # Internally, the Wald test is implemented as a simple two-sided > # z test of beta/betaSE. Two demonstrate this, we to the test > # manually and compare > pvalDE <- 2 * pnorm( abs( beta ), sd = betaSE, lower.tail=FALSE ) > all( abs( pvalDE - results(dse)$pvalue ) < 1e-15, na.rm=TRUE ) > # (returns TRUE) > > # This was the test for DE, of course, i.e., small pvalDE means that > # the gene's expression change (the true value of beta) is not zero > > # What we want is the opposite, namely find gene, for which abs(beta) > # is smaller than some threshold, theta > theta <- .3 > > # So, we do our two one-sided tests. For a one-sided z test, we > # simply use tail probabilities from the normal distribution. > > # First, the test of H0_A: true_beta > thr > pA <- pnorm( beta, thr, betaSE, lower.tail=TRUE ) > > # Next, the test of H0_B: true_beta < -thr > pB <- pnorm( beta, -thr, betaSE, lower.tail=FALSE ) > > # The overall p value is the maximum, because we want to reject H0_A > # and H0_B simultaneously > pvalTOST <- pmax( pA, pB ) > > > # Let's adjust our two p values with BH: > sigDE <- p.adjust( pvalDE, "BH" ) < .1 > sigSmall <- p.adjust( pvalTOST, "BH" ) < .1 > > # And make an MA plot, with sigDE in red and sigSmall in green > plot( > rowMeans( counts(dse,normalized=TRUE) ), beta, > log="x", pch=20, cex=.2, > col = 1 + sigDE + 2*sigSmall ) > # Plot is attached. > > I think, we will include this code in one of the coming releases of > DESeq2. > > > Simon >

ADD REPLY • link 11.1 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Out of curiosity, could a TOST also be implemented using likelihood ratio tests (or other tests) by using a pair of reduced models where the desired coefficient is set to +/-theta instead of zero? On 03/22/2013 05:08 AM, Simon Anders wrote: > Hi Ryan > > On 22/03/13 00:11, Ryan C. Thompson wrote: >> I got as far as computing the z-scores, but I'm not sure how to go from >> there to p-values for a test of equivalence. > > I guess I made it sound more trivial than it is. The "two one-sided > tests" (TOST) scheme already mentioned in this thread is originally > due to Schuirmann (Journal of Pharmacokinetics and Pharmacodynamics, > 1987) -- or rather, the acronym is due to him, the idea is probably > older, and he just pointed out that this what one should do in > situations such as yours. > > To recap, you want to test the null hypothesis that your log fold > change beta is larger than some threshold theta, i.e., you want to > find genes for which you can reject this null hypothesis and hence say > with some certainty that |beta| < theta. The TOST scheme suggests, as > its name implies to perform two pone sided tests, one with the null > hypothesis H0_A: beta > theta, the other with H0_B: beta < -theta, and > then use the larger of the two p values. > > I demonstrate this below with some code. > > # Let's use the example data from the pasilla package > library( DESeq2 ) > library( pasilla ) > data( "pasillaGenes" ) > > # Create a DESeq2 data object from the pasilla data > dse <- DESeqSummarizedExperimentFromMatrix( > counts(pasillaGenes), pData(pasillaGenes)[,c("condition","type")], > ~ type + condition ) > > # Perform a standard DESeq2 analysis > dse <- DESeq(dse) > > # The log2 fold changes are found here > beta <- results(dse)$log2FoldChange > > # Just to make the following clearer, I should point out that the > # "results" accessor is just a short-cut for this access to the rowData: > all( beta == mcols(rowData(dse))$conditionuntreated, na.rm=TRUE ) > # (returns TRUE) > > # The log fold change estimates all come with standard error information > # which we find in the rowData (maybe we should copy this to the > # 'results', too) > betaSE <- mcols(rowData(dse))$SE_conditionuntreated > > # Internally, the Wald test is implemented as a simple two-sided > # z test of beta/betaSE. Two demonstrate this, we to the test > # manually and compare > pvalDE <- 2 * pnorm( abs( beta ), sd = betaSE, lower.tail=FALSE ) > all( abs( pvalDE - results(dse)$pvalue ) < 1e-15, na.rm=TRUE ) > # (returns TRUE) > > # This was the test for DE, of course, i.e., small pvalDE means that > # the gene's expression change (the true value of beta) is not zero > > # What we want is the opposite, namely find gene, for which abs(beta) > # is smaller than some threshold, theta > theta <- .3 > > # So, we do our two one-sided tests. For a one-sided z test, we > # simply use tail probabilities from the normal distribution. > > # First, the test of H0_A: true_beta > thr > pA <- pnorm( beta, thr, betaSE, lower.tail=TRUE ) > > # Next, the test of H0_B: true_beta < -thr > pB <- pnorm( beta, -thr, betaSE, lower.tail=FALSE ) > > # The overall p value is the maximum, because we want to reject H0_A > # and H0_B simultaneously > pvalTOST <- pmax( pA, pB ) > > > # Let's adjust our two p values with BH: > sigDE <- p.adjust( pvalDE, "BH" ) < .1 > sigSmall <- p.adjust( pvalTOST, "BH" ) < .1 > > # And make an MA plot, with sigDE in red and sigSmall in green > plot( > rowMeans( counts(dse,normalized=TRUE) ), beta, > log="x", pch=20, cex=.2, > col = 1 + sigDE + 2*sigSmall ) > # Plot is attached. > > I think, we will include this code in one of the coming releases of > DESeq2. > > > Simon >

ADD REPLY • link 11.1 years ago Ryan C. Thompson ★ 7.9k

Login before adding your answer.