Question: edgeR - R script - results compared to DESeq
0
gravatar for Gordon Smyth
7.7 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:
Hi Hilary, > Date: Wed, 30 Nov 2011 08:30:36 -0500 (EST) > From: "Smith, Hilary A" <hilary.smith at="" gatech.edu=""> > To: bioconductor at r-project.org > Subject: [BioC] edgeR - R script - results compared to DESeq > > In case it helps the discussion ... I also tried running GLMs in both > DESeq and edgeR. I likewise found that edgeR yielded more differentially > expressed tags or genes. I know Dr. Gordon Smyth mentioned > calcNormFactors and tagwise dispersion; I did use both of these options. > > If it helps, an abstract description of the model comparison used in > both programs is below (and below that if helpful, full code for edgeR > -- I am using the newest release). I assume the differences are coming > from the ways DESeq and edgeR estimate dispersion, but I'm eager to > learn more about the rationale (especially given I just started using > R/Bioconductor a few weeks ago) and note the results below in case they > are of use in identifying the differences. I am glad to hear this is not > just a factor of my dataset and is a common feature to have edgeR find > more genes. The rationale of the edgeR package is explained in: http://bioinformatics.oxfordjournals.org/content/23/21/2881 The p-values from edgeR are very slightly liberal because it treats the dispersions as known in the testing procedure. I think other packages are also subject this same assumption however. > My models for main effects A and B (with 3 biological reps. each) and their interaction are: > Full: (A + B + A:B) > Reduced1: (A + B) > Reduced 2: (A) > > The comparison of the Full vs. Reduced1 yields the genes impacted by the > interaction term A:B. To obtain genes impacted by the main effect A, I > perform a comparison of Full vs. Reduced1 and a comparison of Full vs. > Reduced 2 -- those genes found at P.adj<0.05 in the Full vs. Reduced 2 > comparison but not found in Full vs. Reduced 1 are what I am noting as > genes impacted by main effect A. (To the best of my knowledge I cannot > simply drop term A and leave in the interaction term, so this is my > attempt to isolate term A. If there's a better way to do this, I'd be > glad to know.). You are correct that you can't remove A and keep A:B. However many statisticians (John Nelder for example) have argued that the concept of a main effect for A is not meaningful in the presence of a nonzero interaction A:B. If A:B is nonzero, then A can actually be interpretted as having two different main effects, one for each level of B. To find genes responding to A in either level of B, you would compare the Full model with just B. This would test for an A effect in either or both levels of B. Alternatively, and this is what I usually recommend, you can perform a separate test for A within each level of B. To do this, you use glmFit to fit the model (B + A:B), which will have four parameters if A and B each have two levels. This is actually the same model as your Full model but parametrized differently. Then glmLRT(d,fit,coef=3) tests for an A effect when B is at its first level, and glmLRT(d,fit,coef=4) tests for A when B is at its second level. BTW, you can perform these same tests after fitting A+B+A:B using the contrast argument of glmLRT, but reparametrizing to B+A:B makes it simpler. Best wishes Gordon > My results for the interaction term are: > edgeR: 173 genes > DESeq: 38 genes > > > For the main effect A: > edgeR: 261 genes > DESeq: 61 genes > **NOTE: For this comparison of term A, of the 61 genes found by DESeq, > about 44 (or ~72%) were also found by edgeR. > > I did have warnings in running DESeq that the Full model GLM didn't > converge which is disconcerting... edgeR didn't give these warnings but > still found more components. > > Best, > Hilary > > > ~~In the code below, main effect "A" above is "Season," and main effect "B" above is "Hydroperiod." > >> library(edgeR) >> library(limma) >> raw.data = read.csv("2011.11.14counts.csv", header=TRUE, stringsAsFactors=FALSE) >> d = raw.data[,2:13] >> rownames(d) = raw.data[,1] >> head(d) > X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F > comp0 1159 1572 1817 605 1113 1732 1065 1207 1477 1841 1915 > comp1 534 675 739 236 451 799 544 341 333 690 502 > comp10 37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300 > > comp100 1065 1332 1620 658 861 1370 1060 999 918 1697 1117 > comp1000 157 266 247 135 188 244 130 229 141 263 182 > comp10000 14 37 47 17 21 64 35 15 10 28 22 > X3P_F > comp0 1645 > comp1 571 > comp10 44575 > comp100 1336 > comp1000 168 > comp10000 12 > >> Hydroperiod = factor(c("E", "E", "E", "P", "P", "P", "E", "E", "E", "P", "P", "P")) >> Season = factor(c("R", "R", "R", "R", "R", "R", "F", "F", "F", "F", "F", "F")) >> design = model.matrix(~Hydroperiod + Season + Hydroperiod:Season) >> design > (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR > 1 1 0 1 0 > 2 1 0 1 0 > 3 1 0 1 0 > 4 1 1 1 1 > 5 1 1 1 1 > 6 1 1 1 1 > 7 1 0 0 0 > 8 1 0 0 0 > 9 1 0 0 0 > 10 1 1 0 0 > 11 1 1 0 0 > 12 1 1 0 0 > attr(,"assign") > [1] 0 1 2 3 > attr(,"contrasts") > attr(,"contrasts")$Hydroperiod > [1] "contr.treatment" > > attr(,"contrasts")$Season > [1] "contr.treatment" > >> d.GLM = DGEList(d, group = c("ER", "ER", "ER", "PR", "PR", "PR", "EF", "EF", "EF", "PF", "PF", "PF")) > Calculating library sizes from column totals. >> d.GLM = calcNormFactors(d.GLM) >> d.GLM > An object of class "DGEList" > $samples > group lib.size norm.factors > X1E_R ER 23295633 0.9559226 > X2E_R ER 25882545 1.1040337 > X3E_R ER 29401480 1.0236513 > X1P_R PR 20877015 0.8199915 > X2P_R PR 26649613 0.8869479 > 7 more rows ... > > $counts > X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F > comp0 1159 1572 1817 605 1113 1732 1065 1207 1477 1841 1915 > comp1 534 675 739 236 451 799 544 341 333 690 502 > comp10 37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300 > comp100 1065 1332 1620 658 861 1370 1060 999 918 1697 1117 > comp1000 157 266 247 135 188 244 130 229 141 263 182 > X3P_F > comp0 1645 > comp1 571 > comp10 44575 > > comp100 1336 > comp1000 168 > 25055 more rows ... > > $all.zeros > comp0 comp1 comp10 comp100 comp1000 > FALSE FALSE FALSE FALSE FALSE > 25055 more elements ... > >> nrow(d.GLM) > [1] 25060 >> dim(d.GLM) > [1] 25060 12 > > >> design > (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR > 1 1 0 1 0 > 2 1 0 1 0 > 3 1 0 1 0 > 4 1 1 1 1 > 5 1 1 1 1 > 6 1 1 1 1 > 7 1 0 0 0 > 8 1 0 0 0 > 9 1 0 0 0 > 10 1 1 0 0 > 11 1 1 0 0 > 12 1 1 0 0 > attr(,"assign") > [1] 0 1 2 3 > attr(,"contrasts") > attr(,"contrasts")$Hydroperiod > [1] "contr.treatment" > > attr(,"contrasts")$Season > [1] "contr.treatment" > >> d.GLM = estimateGLMCommonDisp(d.GLM, design) >> names(d.GLM) > [1] "samples" "counts" "all.zeros" > [4] "common.dispersion" >> d.GLM$common.dispersion > [1] 0.1488192 >> sqrt(d.GLM$common.dispersion) > [1] 0.3857709 >> # 0.3857709 is the Coefficient of Biological Variation >> d.GLM = estimateGLMTrendedDisp(d.GLM, design) > Loading required package: splines >> summary(d.GLM$trended.dispersion) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 0.07541 0.10240 0.19030 0.25500 0.37530 1.26900 >> d.GLM = estimateGLMTagwiseDisp(d.GLM, design) >> d.GLM$prior.n > NULL >> d$prior.n > NULL >> ls() > [1] "Hydroperiod" "Season" "d" "d.GLM" "design" > [6] "raw.data" >> d.GLM > An object of class "DGEList" > $samples > group lib.size norm.factors > X1E_R ER 23295633 0.9559226 > X2E_R ER 25882545 1.1040337 > X3E_R ER 29401480 1.0236513 > > X1P_R PR 20877015 0.8199915 > X2P_R PR 26649613 0.8869479 > 7 more rows ... > > $counts > X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F > comp0 1159 1572 1817 605 1113 1732 1065 1207 1477 1841 1915 > comp1 534 675 739 236 451 799 544 341 333 690 502 > comp10 37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300 > comp100 1065 1332 1620 658 861 1370 1060 999 918 1697 1117 > comp1000 157 266 247 135 188 244 130 229 141 263 182 > X3P_F > comp0 1645 > comp1 571 > comp10 44575 > comp100 1336 > comp1000 168 > 25055 more rows ... > > $all.zeros > comp0 comp1 comp10 comp100 comp1000 > FALSE FALSE FALSE FALSE FALSE > 25055 more elements ... > > $common.dispersion > [1] 0.1488192 > > $trended.dispersion > [1] 0.07619372 0.08679353 0.10444478 0.07739597 0.10629516 > 25055 more elements ... > > $abundance > comp0 comp1 comp10 comp100 comp1000 > -9.737514 -10.720757 -6.331670 -9.937984 -11.724981 > 25055 more elements ... > > $bin.dispersion > [1] 0.7340070 0.6589801 0.5901124 0.5500868 0.4865594 > 22 more elements ... > > $bin.abundance > [1] -17.08166 -16.46619 -16.13033 -15.84028 -15.59074 > 22 more elements ... > > $tagwise.dispersion > [1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 > 25055 more elements ... > >> ?getPriorN >> getPriorN(d.GLM, design=design) > [1] 2.5 >> head(d.GLM$tagwise.dispersion) > [1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508 >> summary(d.GLM$tagwise.dispersion) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 0.05139 0.09363 0.18700 0.25910 0.35480 1.71800 >> glmfit.tgw = glmFit(d.GLM, design, dispersion=d.GLM$tagwise.dispersion) >> lrt.tgw = glmLRT(d.GLM, glmfit.tgw) >> topTags(lrt.tgw) > Coefficient: HydroperiodP:SeasonR > logConc logFC LR P.Value FDR > comp13665 -10.932729 1.068620e+01 59.37370 1.304000e-14 3.267824e-10 > comp15478 -12.077538 7.742379e+00 47.67249 5.037112e-12 5.552272e-08 > comp370 -13.588446 1.442695e+08 46.86820 7.592458e-12 5.552272e-08 > > > comp10848 -11.836655 8.970575e+00 46.56512 8.862366e-12 5.552272e-08 > comp13403 -9.315242 3.773345e+00 44.92234 2.050057e-11 1.027488e-07 > comp7180 -11.518762 7.869625e+00 43.18096 4.990399e-11 2.084323e-07 > comp2502 -10.479763 5.723705e+00 41.56745 1.138735e-10 4.076673e-07 > comp2740 -10.814075 4.666231e+00 38.00241 7.065735e-10 2.213342e-06 > comp4314 -13.104853 4.837400e+00 35.44570 2.622602e-09 7.302491e-06 > comp13675 -12.930253 4.442508e+00 34.46100 4.348772e-09 1.089802e-05 >> summary(decideTestsDGE(lrt.tgw)) > [,1] > -1 57 > 0 24887 > 1 116 > > >> design > (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR > 1 1 0 1 0 > 2 1 0 1 0 > 3 1 0 1 0 > 4 1 1 1 1 > 5 1 1 1 1 > 6 1 1 1 1 > 7 1 0 0 0 > 8 1 0 0 0 > 9 1 0 0 0 > 10 1 1 0 0 > 11 1 1 0 0 > 12 1 1 0 0 > attr(,"assign") > [1] 0 1 2 3 > attr(,"contrasts") > attr(,"contrasts")$Hydroperiod > [1] "contr.treatment" > > attr(,"contrasts")$Season > [1] "contr.treatment" > >> lrt.coef4 = glmLRT(d.GLM, glmfit.tgw, coef=4) >> topTags(lrt.coef4) > Coefficient: HydroperiodP:SeasonR > logConc logFC LR P.Value FDR > comp13665 -10.932729 1.068620e+01 59.37370 1.304000e-14 3.267824e-10 > comp15478 -12.077538 7.742379e+00 47.67249 5.037112e-12 5.552272e-08 > comp370 -13.588446 1.442695e+08 46.86820 7.592458e-12 5.552272e-08 > comp10848 -11.836655 8.970575e+00 46.56512 8.862366e-12 5.552272e-08 > comp13403 -9.315242 3.773345e+00 44.92234 2.050057e-11 1.027488e-07 > comp7180 -11.518762 7.869625e+00 43.18096 4.990399e-11 2.084323e-07 > comp2502 -10.479763 5.723705e+00 41.56745 1.138735e-10 4.076673e-07 > comp2740 -10.814075 4.666231e+00 38.00241 7.065735e-10 2.213342e-06 > comp4314 -13.104853 4.837400e+00 35.44570 2.622602e-09 7.302491e-06 > comp13675 -12.930253 4.442508e+00 34.46100 4.348772e-09 1.089802e-05 >> lrt.coef34 = glmLRT(d.GLM, glmfit.tgw, coef=3:4) >> topTags(lrt.coef34) > Coefficient: SeasonR HydroperiodP:SeasonR > logConc SeasonR HydroperiodP.SeasonR LR P.Value > comp11779 -13.60929 1.442695e+08 -1.442695e+08 88.20296 7.030231e-20 > comp21414 -13.64545 5.616807e+00 1.442695e+08 71.78587 2.581642e-16 > comp6411 -10.57883 1.671168e+00 2.006498e+00 67.75411 1.938124e-15 > comp6417 -10.10518 1.510699e+00 1.224445e+00 65.09872 7.311274e-15 > comp13665 -10.93273 -2.193560e+00 1.068620e+01 62.70305 2.422182e-14 > comp1872 -12.53141 3.893662e+00 -2.007081e+00 62.49057 2.693670e-14 > comp5005 -11.13142 4.565629e-01 3.063432e+00 61.87012 3.673456e-14 > comp15150 -13.28156 5.032869e+00 1.442695e+08 58.96222 1.572234e-13 > comp2502 -10.47976 -4.007870e-01 5.723705e+00 56.89575 4.418199e-13 > comp19402 -11.60722 5.375493e+00 -5.180038e+00 52.82636 3.379876e-12 > FDR > comp11779 1.761776e-15 > comp21414 3.234797e-12 > comp6411 1.618980e-11 > comp6417 4.580513e-11 > comp13665 1.125056e-10 > comp1872 1.125056e-10 > comp5005 1.315097e-10 > comp15150 4.925022e-10 > comp2502 1.230223e-09 > comp19402 8.469969e-09 >> ls() > [1] "Hydroperiod" "Season" "d" "d.GLM" "design" > > > [6] "glmfit.tgw" "lrt.coef34" "lrt.coef4" "lrt.tgw" "oo" > [11] "raw.data" >> head(lrt.coef34) > An object of class "DGELRT" > $samples > group lib.size norm.factors > X1E_R ER 23295633 0.9559226 > X2E_R ER 25882545 1.1040337 > X3E_R ER 29401480 1.0236513 > X1P_R PR 20877015 0.8199915 > X2P_R PR 26649613 0.8869479 > 7 more rows ... > > $all.zeros > comp0 comp1 comp10 comp100 comp1000 comp10000 > FALSE FALSE FALSE FALSE FALSE FALSE > > $common.dispersion > [1] 0.1488192 > > $trended.dispersion > [1] 0.07619372 0.08679353 0.10444478 0.07739597 0.10629516 0.20365547 > > $abundance > comp0 comp1 comp10 comp100 comp1000 comp10000 > -9.737514 -10.720757 -6.331670 -9.937984 -11.724981 -13.712600 > > $bin.dispersion > [1] 0.7340070 0.6589801 0.5901124 0.5500868 0.4865594 > 22 more elements ... > > $bin.abundance > [1] -17.08166 -16.46619 -16.13033 -15.84028 -15.59074 > 22 more elements ... > > $tagwise.dispersion > [1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508 > > $coef > [1] 4 > > $table > logConc logFC.SeasonR logFC.HydroperiodP.SeasonR LR.statistic > comp0 -9.736436 -0.24995924 -0.3214735 4.34114504 > comp1 -10.739594 0.20037356 -0.3870300 0.77512680 > comp10 -6.341516 0.37480089 -0.5276106 1.59407277 > comp100 -9.940774 -0.06226595 -0.3360123 2.12228134 > comp1000 -11.726085 -0.07140854 0.1131265 0.05488506 > comp10000 -13.750173 0.21120837 0.5708073 1.99318234 > p.value > comp0 0.1141123 > comp1 0.6787086 > comp10 0.4506626 > comp100 0.3460608 > comp1000 0.9729306 > comp10000 0.3691356 > > $coefficients.full > (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR > comp0 -9.620221 0.028007573 -0.17325854 -0.22282846 > comp1 -10.774172 0.058157626 0.13888837 -0.26826873 > comp10 -6.555230 0.335378111 0.25979218 -0.36571177 > comp100 -9.871905 0.009775718 -0.04315947 -0.23290600 > comp1000 -11.662643 -0.118031564 -0.04949663 0.07841331 > comp10000 -13.805669 -0.272566725 0.14639848 0.39565348 > > $coefficients.null > (Intercept) HydroperiodP > comp0 -9.703284 -0.06737999 > comp1 -10.701998 -0.07653693 > comp10 -6.416913 0.14550436 > comp100 -9.893311 -0.09719967 > comp1000 -11.687332 -0.07884619 > comp10000 -13.727706 -0.04514128 > > $design.full > (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR > 1 1 0 1 0 > 2 1 0 1 0 > 3 1 0 1 0 > 4 1 1 1 1 > 5 1 1 1 1 > 7 more rows ... > > $design.null > (Intercept) HydroperiodP > 1 1 0 > 2 1 0 > 3 1 0 > 4 1 1 > 5 1 1 > 7 more rows ... > > $dispersion.used > [1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508 > > $comparison > [1] "SeasonR" "HydroperiodP:SeasonR" > >> > RemoveSeasonRmHydBySeason = topTags(lrt.coef34, number =25060) > Error in topTags(lrt.coef34, number = 25060) : > unused argument(s) (number = 25060) >> ?topTags >> RemoveSeasonRmHydBySeason = topTags(lrt.coef34, n =25060) >> ls() > [1] "Hydroperiod" "RemoveSeasonRmHydBySeason" > [3] "Season" "d" > [5] "d.GLM" "design" > [7] "glmfit.tgw" "lrt.coef34" > [9] "lrt.coef4" "lrt.tgw" > [11] "oo" "raw.data" >> RemoveHydBySeason = topTags(lrt.coef4, n=25060) >> write.csv(RemoveHydBySeason, "RemoveHydBySeason.csv") >> write.csv(RemoveSeasonRmHydBySeason, "RemoveSeasonRmHydBySeason.csv") >> summary(decideTestsDGE(lrt.coef4) > + >> summary(decideTestsDGE(lrt.coef4)) > [,1] > -1 57 > 0 24887 > 1 116 >> summary(decideTestsDGE(lrt.coef34)) > Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : > attempt to set an attribute on NULL >> design > (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR > 1 1 0 1 0 > 2 1 0 1 0 > 3 1 0 1 0 > 4 1 1 1 1 > 5 1 1 1 1 > 6 1 1 1 1 > 7 1 0 0 0 > 8 1 0 0 0 > 9 1 0 0 0 > 10 1 1 0 0 > 11 1 1 0 0 > 12 1 1 0 0 > attr(,"assign") > [1] 0 1 2 3 > attr(,"contrasts") > attr(,"contrasts")$Hydroperiod > [1] "contr.treatment" > > attr(,"contrasts")$Season > [1] "contr.treatment" ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
glad edger deseq • 617 views
ADD COMMENTlink written 7.7 years ago by Gordon Smyth38k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 104 users visited in the last hour