Search
Question: edgeR - R script - results compared to DESeq
0
gravatar for Avinash S
6.8 years ago by
Avinash S40
Avinash S40 wrote:
Hello List, I'm just starting with R and wanted to analyze my data for differential expression using edgeR. Here is the code which is working for me but I want to check if I'm missing something as I get more number of differentially expressed genes compared to DESeq MY Sample data GeneID CR1 CR2 MR1 MR2 3119s00200.1 78 78 148 124 3119s00202.1 33 68 168 198 3119s00232.1 52 73 135 99 CR1 and CR2 are replicates of controls and MR1 and MR2 are treated sample replicates library(edgeR) library(limma) raw.data <- read.delim("RNAseq-MycReadCountData-EdgeR.txt") names(raw.data) d <- raw.data[, 2:5] rownames(d) <- raw.data[, 1] group <- c(rep("CR", 2), rep("MR", 2)) d <- DGEList(counts = d, group = group) d <- estimateCommonDisp(d) et <- exactTest(d) topTags(et) etTag = rownames(topTags(et)$table) sum(et$table$p.value <0.05) sum(p.adjust(et$table$p.value,method="BH") < 0.1) good = sum(et$table$p.value <0.05) goodList = topTags(et, n=good) sink("edgeR-MR-pVal005.csv") goodList sink() I compared the result with DESeq and I get about 2000 genes more in edgeR at pVal < 0.05, however, the the matched genes showed same log2foldchanges. Is it usual that edgeR gives more number of diff.expr genes? Thank you for your time, Avinash [[alternative HTML version deleted]]
ADD COMMENTlink modified 6.8 years ago by Smith, Hilary A40 • written 6.8 years ago by Avinash S40
0
gravatar for Avinash S
6.8 years ago by
Avinash S40
Avinash S40 wrote:
> Hello List, > > I'm just starting with R and wanted to analyze my data for differential > expression using edgeR. Here is the code which is working for me but I want > to check if I'm missing something as I get more number of differentially > expressed genes compared to DESeq > > MY Sample data > > GeneID CR1 CR2 MR1 MR2 > 3119s00200.1 78 78 148 124 > 3119s00202.1 33 68 168 198 > 3119s00232.1 52 73 135 99 > > CR1 and CR2 are replicates of controls and MR1 and MR2 are treated sample > replicates > > library(edgeR) > library(limma) > raw.data <- read.delim("RNAseq-MycReadCountData-EdgeR.txt") > names(raw.data) > > d <- raw.data[, 2:5] > rownames(d) <- raw.data[, 1] > group <- c(rep("CR", 2), rep("MR", 2)) > d <- DGEList(counts = d, group = group) > d <- estimateCommonDisp(d) > et <- exactTest(d) > topTags(et) > etTag = rownames(topTags(et)$table) > sum(et$table$p.value <0.05) > sum(p.adjust(et$table$p.value,method="BH") < 0.1) > good = sum(et$table$p.value <0.05) > goodList = topTags(et, n=good) > sink("edgeR-MR-pVal005.csv") > goodList > sink() > > I compared the result with DESeq and I get about 2000 genes more in edgeR > at pVal < 0.05, however, the the matched genes showed same log2foldchanges. > Is it usual that edgeR gives more number of diff.expr genes? > > Thank you for your time, > Avinash > [[alternative HTML version deleted]]
ADD COMMENTlink written 6.8 years ago by Avinash S40
0
gravatar for Avinash S
6.8 years ago by
Avinash S40
Avinash S40 wrote:
> > Hello List, > > I'm just starting with R and wanted to analyze my data for differential > expression using edgeR. Here is the code which is working for me but I want > to check if I'm missing something as I get more number of differentially > expressed genes compared to DESeq > > MY Sample data > > GeneID CR1 CR2 MR1 MR2 > 3119s00200.1 78 78 148 124 > 3119s00202.1 33 68 168 198 > 3119s00232.1 52 73 135 99 > > CR1 and CR2 are replicates of controls and MR1 and MR2 are treated sample > replicates > > library(edgeR) > library(limma) > raw.data <- read.delim("RNAseq-MycReadCountData-EdgeR.txt") > names(raw.data) > > d <- raw.data[, 2:5] > rownames(d) <- raw.data[, 1] > group <- c(rep("CR", 2), rep("MR", 2)) > d <- DGEList(counts = d, group = group) > d <- estimateCommonDisp(d) > et <- exactTest(d) > topTags(et) > etTag = rownames(topTags(et)$table) > sum(et$table$p.value <0.05) > sum(p.adjust(et$table$p.value,method="BH") < 0.1) > good = sum(et$table$p.value <0.05) > goodList = topTags(et, n=good) > sink("edgeR-MR-pVal005.csv") > goodList > sink() > > I compared the result with DESeq and I get about 2000 genes more in edgeR > at pVal < 0.05, however, the the matched genes showed same log2foldchanges. > Is it usual that edgeR gives more number of diff.expr genes? > > Thank you for your time, > Avinash > [[alternative HTML version deleted]]
ADD COMMENTlink written 6.8 years ago by Avinash S40
0
gravatar for Simon Anders
6.8 years ago by
Simon Anders3.5k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.5k wrote:
On 2011-11-28 16:58, Avinash S wrote: > I'm just starting with R and wanted to analyze my data for differential > expression using edgeR. Here is the code which is working for me but I want > to check if I'm missing something as I get more number of differentially > expressed genes compared to DESeq As you do not tell us how you used DESeq, it is hard to compare. However, why do you first (correctly) check how many genes have an adjusted p value below your chosen FDR but then (incorrectly) export genes according to their raw p value? > sum(p.adjust(et$table$p.value,method="BH")< 0.1) > good = sum(et$table$p.value<0.05) > I compared the result with DESeq and I get about 2000 genes more in edgeR > at pVal< 0.05, however, the the matched genes showed same log2foldchanges. > Is it usual that edgeR gives more number of diff.expr genes? If you cut your gene list at a raw p value of 5%, you will get many false positives. Please read up on multiple hypothesis testing and false dicovery rate. Simon
ADD COMMENTlink written 6.8 years ago by Simon Anders3.5k
0
gravatar for Smith, Hilary A
6.8 years ago by
Smith, Hilary A40 wrote:
In case it helps the discussion ... I also tried running GLMs in both DESeq and edgeR. I likewise found that edgeR yielded more differentially expressed tags or genes. I know Dr. Gordon Smyth mentioned calcNormFactors and tagwise dispersion; I did use both of these options. If it helps, an abstract description of the model comparison used in both programs is below (and below that if helpful, full code for edgeR -- I am using the newest release). I assume the differences are coming from the ways DESeq and edgeR estimate dispersion, but I'm eager to learn more about the rationale (especially given I just started using R/Bioconductor a few weeks ago) and note the results below in case they are of use in identifying the differences. I am glad to hear this is not just a factor of my dataset and is a common feature to have edgeR find more genes. My models for main effects A and B (with 3 biological reps. each) and their interaction are: Full: (A + B + A:B) Reduced1: (A + B) Reduced 2: (A) The comparison of the Full vs. Reduced1 yields the genes impacted by the interaction term A:B. To obtain genes impacted by the main effect A, I perform a comparison of Full vs. Reduced1 and a comparison of Full vs. Reduced 2 -- those genes found at P.adj<0.05 in the Full vs. Reduced 2 comparison but not found in Full vs. Reduced 1 are what I am noting as genes impacted by main effect A. (To the best of my knowledge I cannot simply drop term A and leave in the interaction term, so this is my attempt to isolate term A. If there's a better way to do this, I'd be glad to know.). My results for the interaction term are: edgeR: 173 genes DESeq: 38 genes For the main effect A: edgeR: 261 genes DESeq: 61 genes **NOTE: For this comparison of term A, of the 61 genes found by DESeq, about 44 (or ~72%) were also found by edgeR. I did have warnings in running DESeq that the Full model GLM didn't converge which is disconcerting... edgeR didn't give these warnings but still found more components. Best, Hilary ~~In the code below, main effect "A" above is "Season," and main effect "B" above is "Hydroperiod." > library(edgeR) > library(limma) > raw.data = read.csv("2011.11.14counts.csv", header=TRUE, stringsAsFactors=FALSE) > d = raw.data[,2:13] > rownames(d) = raw.data[,1] > head(d) X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F comp0 1159 1572 1817 605 1113 1732 1065 1207 1477 1841 1915 comp1 534 675 739 236 451 799 544 341 333 690 502 comp10 37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300 comp100 1065 1332 1620 658 861 1370 1060 999 918 1697 1117 comp1000 157 266 247 135 188 244 130 229 141 263 182 comp10000 14 37 47 17 21 64 35 15 10 28 22 X3P_F comp0 1645 comp1 571 comp10 44575 comp100 1336 comp1000 168 comp10000 12 > Hydroperiod = factor(c("E", "E", "E", "P", "P", "P", "E", "E", "E", "P", "P", "P")) > Season = factor(c("R", "R", "R", "R", "R", "R", "F", "F", "F", "F", "F", "F")) > design = model.matrix(~Hydroperiod + Season + Hydroperiod:Season) > design (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR 1 1 0 1 0 2 1 0 1 0 3 1 0 1 0 4 1 1 1 1 5 1 1 1 1 6 1 1 1 1 7 1 0 0 0 8 1 0 0 0 9 1 0 0 0 10 1 1 0 0 11 1 1 0 0 12 1 1 0 0 attr(,"assign") [1] 0 1 2 3 attr(,"contrasts") attr(,"contrasts")$Hydroperiod [1] "contr.treatment" attr(,"contrasts")$Season [1] "contr.treatment" > d.GLM = DGEList(d, group = c("ER", "ER", "ER", "PR", "PR", "PR", "EF", "EF", "EF", "PF", "PF", "PF")) Calculating library sizes from column totals. > d.GLM = calcNormFactors(d.GLM) > d.GLM An object of class "DGEList" $samples group lib.size norm.factors X1E_R ER 23295633 0.9559226 X2E_R ER 25882545 1.1040337 X3E_R ER 29401480 1.0236513 X1P_R PR 20877015 0.8199915 X2P_R PR 26649613 0.8869479 7 more rows ... $counts X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F comp0 1159 1572 1817 605 1113 1732 1065 1207 1477 1841 1915 comp1 534 675 739 236 451 799 544 341 333 690 502 comp10 37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300 comp100 1065 1332 1620 658 861 1370 1060 999 918 1697 1117 comp1000 157 266 247 135 188 244 130 229 141 263 182 X3P_F comp0 1645 comp1 571 comp10 44575 comp100 1336 comp1000 168 25055 more rows ... $all.zeros comp0 comp1 comp10 comp100 comp1000 FALSE FALSE FALSE FALSE FALSE 25055 more elements ... > nrow(d.GLM) [1] 25060 > dim(d.GLM) [1] 25060 12 > design (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR 1 1 0 1 0 2 1 0 1 0 3 1 0 1 0 4 1 1 1 1 5 1 1 1 1 6 1 1 1 1 7 1 0 0 0 8 1 0 0 0 9 1 0 0 0 10 1 1 0 0 11 1 1 0 0 12 1 1 0 0 attr(,"assign") [1] 0 1 2 3 attr(,"contrasts") attr(,"contrasts")$Hydroperiod [1] "contr.treatment" attr(,"contrasts")$Season [1] "contr.treatment" > d.GLM = estimateGLMCommonDisp(d.GLM, design) > names(d.GLM) [1] "samples" "counts" "all.zeros" [4] "common.dispersion" > d.GLM$common.dispersion [1] 0.1488192 > sqrt(d.GLM$common.dispersion) [1] 0.3857709 > # 0.3857709 is the Coefficient of Biological Variation > d.GLM = estimateGLMTrendedDisp(d.GLM, design) Loading required package: splines > summary(d.GLM$trended.dispersion) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.07541 0.10240 0.19030 0.25500 0.37530 1.26900 > d.GLM = estimateGLMTagwiseDisp(d.GLM, design) > d.GLM$prior.n NULL > d$prior.n NULL > ls() [1] "Hydroperiod" "Season" "d" "d.GLM" "design" [6] "raw.data" > d.GLM An object of class "DGEList" $samples group lib.size norm.factors X1E_R ER 23295633 0.9559226 X2E_R ER 25882545 1.1040337 X3E_R ER 29401480 1.0236513 X1P_R PR 20877015 0.8199915 X2P_R PR 26649613 0.8869479 7 more rows ... $counts X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F comp0 1159 1572 1817 605 1113 1732 1065 1207 1477 1841 1915 comp1 534 675 739 236 451 799 544 341 333 690 502 comp10 37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300 comp100 1065 1332 1620 658 861 1370 1060 999 918 1697 1117 comp1000 157 266 247 135 188 244 130 229 141 263 182 X3P_F comp0 1645 comp1 571 comp10 44575 comp100 1336 comp1000 168 25055 more rows ... $all.zeros comp0 comp1 comp10 comp100 comp1000 FALSE FALSE FALSE FALSE FALSE 25055 more elements ... $common.dispersion [1] 0.1488192 $trended.dispersion [1] 0.07619372 0.08679353 0.10444478 0.07739597 0.10629516 25055 more elements ... $abundance comp0 comp1 comp10 comp100 comp1000 -9.737514 -10.720757 -6.331670 -9.937984 -11.724981 25055 more elements ... $bin.dispersion [1] 0.7340070 0.6589801 0.5901124 0.5500868 0.4865594 22 more elements ... $bin.abundance [1] -17.08166 -16.46619 -16.13033 -15.84028 -15.59074 22 more elements ... $tagwise.dispersion [1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 25055 more elements ... > ?getPriorN > getPriorN(d.GLM, design=design) [1] 2.5 > head(d.GLM$tagwise.dispersion) [1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508 > summary(d.GLM$tagwise.dispersion) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.05139 0.09363 0.18700 0.25910 0.35480 1.71800 > glmfit.tgw = glmFit(d.GLM, design, dispersion=d.GLM$tagwise.dispersion) > lrt.tgw = glmLRT(d.GLM, glmfit.tgw) > topTags(lrt.tgw) Coefficient: HydroperiodP:SeasonR logConc logFC LR P.Value FDR comp13665 -10.932729 1.068620e+01 59.37370 1.304000e-14 3.267824e-10 comp15478 -12.077538 7.742379e+00 47.67249 5.037112e-12 5.552272e-08 comp370 -13.588446 1.442695e+08 46.86820 7.592458e-12 5.552272e-08 comp10848 -11.836655 8.970575e+00 46.56512 8.862366e-12 5.552272e-08 comp13403 -9.315242 3.773345e+00 44.92234 2.050057e-11 1.027488e-07 comp7180 -11.518762 7.869625e+00 43.18096 4.990399e-11 2.084323e-07 comp2502 -10.479763 5.723705e+00 41.56745 1.138735e-10 4.076673e-07 comp2740 -10.814075 4.666231e+00 38.00241 7.065735e-10 2.213342e-06 comp4314 -13.104853 4.837400e+00 35.44570 2.622602e-09 7.302491e-06 comp13675 -12.930253 4.442508e+00 34.46100 4.348772e-09 1.089802e-05 > summary(decideTestsDGE(lrt.tgw)) [,1] -1 57 0 24887 1 116 > design (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR 1 1 0 1 0 2 1 0 1 0 3 1 0 1 0 4 1 1 1 1 5 1 1 1 1 6 1 1 1 1 7 1 0 0 0 8 1 0 0 0 9 1 0 0 0 10 1 1 0 0 11 1 1 0 0 12 1 1 0 0 attr(,"assign") [1] 0 1 2 3 attr(,"contrasts") attr(,"contrasts")$Hydroperiod [1] "contr.treatment" attr(,"contrasts")$Season [1] "contr.treatment" > lrt.coef4 = glmLRT(d.GLM, glmfit.tgw, coef=4) > topTags(lrt.coef4) Coefficient: HydroperiodP:SeasonR logConc logFC LR P.Value FDR comp13665 -10.932729 1.068620e+01 59.37370 1.304000e-14 3.267824e-10 comp15478 -12.077538 7.742379e+00 47.67249 5.037112e-12 5.552272e-08 comp370 -13.588446 1.442695e+08 46.86820 7.592458e-12 5.552272e-08 comp10848 -11.836655 8.970575e+00 46.56512 8.862366e-12 5.552272e-08 comp13403 -9.315242 3.773345e+00 44.92234 2.050057e-11 1.027488e-07 comp7180 -11.518762 7.869625e+00 43.18096 4.990399e-11 2.084323e-07 comp2502 -10.479763 5.723705e+00 41.56745 1.138735e-10 4.076673e-07 comp2740 -10.814075 4.666231e+00 38.00241 7.065735e-10 2.213342e-06 comp4314 -13.104853 4.837400e+00 35.44570 2.622602e-09 7.302491e-06 comp13675 -12.930253 4.442508e+00 34.46100 4.348772e-09 1.089802e-05 > lrt.coef34 = glmLRT(d.GLM, glmfit.tgw, coef=3:4) > topTags(lrt.coef34) Coefficient: SeasonR HydroperiodP:SeasonR logConc SeasonR HydroperiodP.SeasonR LR P.Value comp11779 -13.60929 1.442695e+08 -1.442695e+08 88.20296 7.030231e-20 comp21414 -13.64545 5.616807e+00 1.442695e+08 71.78587 2.581642e-16 comp6411 -10.57883 1.671168e+00 2.006498e+00 67.75411 1.938124e-15 comp6417 -10.10518 1.510699e+00 1.224445e+00 65.09872 7.311274e-15 comp13665 -10.93273 -2.193560e+00 1.068620e+01 62.70305 2.422182e-14 comp1872 -12.53141 3.893662e+00 -2.007081e+00 62.49057 2.693670e-14 comp5005 -11.13142 4.565629e-01 3.063432e+00 61.87012 3.673456e-14 comp15150 -13.28156 5.032869e+00 1.442695e+08 58.96222 1.572234e-13 comp2502 -10.47976 -4.007870e-01 5.723705e+00 56.89575 4.418199e-13 comp19402 -11.60722 5.375493e+00 -5.180038e+00 52.82636 3.379876e-12 FDR comp11779 1.761776e-15 comp21414 3.234797e-12 comp6411 1.618980e-11 comp6417 4.580513e-11 comp13665 1.125056e-10 comp1872 1.125056e-10 comp5005 1.315097e-10 comp15150 4.925022e-10 comp2502 1.230223e-09 comp19402 8.469969e-09 > ls() [1] "Hydroperiod" "Season" "d" "d.GLM" "design" [6] "glmfit.tgw" "lrt.coef34" "lrt.coef4" "lrt.tgw" "oo" [11] "raw.data" > head(lrt.coef34) An object of class "DGELRT" $samples group lib.size norm.factors X1E_R ER 23295633 0.9559226 X2E_R ER 25882545 1.1040337 X3E_R ER 29401480 1.0236513 X1P_R PR 20877015 0.8199915 X2P_R PR 26649613 0.8869479 7 more rows ... $all.zeros comp0 comp1 comp10 comp100 comp1000 comp10000 FALSE FALSE FALSE FALSE FALSE FALSE $common.dispersion [1] 0.1488192 $trended.dispersion [1] 0.07619372 0.08679353 0.10444478 0.07739597 0.10629516 0.20365547 $abundance comp0 comp1 comp10 comp100 comp1000 comp10000 -9.737514 -10.720757 -6.331670 -9.937984 -11.724981 -13.712600 $bin.dispersion [1] 0.7340070 0.6589801 0.5901124 0.5500868 0.4865594 22 more elements ... $bin.abundance [1] -17.08166 -16.46619 -16.13033 -15.84028 -15.59074 22 more elements ... $tagwise.dispersion [1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508 $coef [1] 4 $table logConc logFC.SeasonR logFC.HydroperiodP.SeasonR LR.statistic comp0 -9.736436 -0.24995924 -0.3214735 4.34114504 comp1 -10.739594 0.20037356 -0.3870300 0.77512680 comp10 -6.341516 0.37480089 -0.5276106 1.59407277 comp100 -9.940774 -0.06226595 -0.3360123 2.12228134 comp1000 -11.726085 -0.07140854 0.1131265 0.05488506 comp10000 -13.750173 0.21120837 0.5708073 1.99318234 p.value comp0 0.1141123 comp1 0.6787086 comp10 0.4506626 comp100 0.3460608 comp1000 0.9729306 comp10000 0.3691356 $coefficients.full (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR comp0 -9.620221 0.028007573 -0.17325854 -0.22282846 comp1 -10.774172 0.058157626 0.13888837 -0.26826873 comp10 -6.555230 0.335378111 0.25979218 -0.36571177 comp100 -9.871905 0.009775718 -0.04315947 -0.23290600 comp1000 -11.662643 -0.118031564 -0.04949663 0.07841331 comp10000 -13.805669 -0.272566725 0.14639848 0.39565348 $coefficients.null (Intercept) HydroperiodP comp0 -9.703284 -0.06737999 comp1 -10.701998 -0.07653693 comp10 -6.416913 0.14550436 comp100 -9.893311 -0.09719967 comp1000 -11.687332 -0.07884619 comp10000 -13.727706 -0.04514128 $design.full (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR 1 1 0 1 0 2 1 0 1 0 3 1 0 1 0 4 1 1 1 1 5 1 1 1 1 7 more rows ... $design.null (Intercept) HydroperiodP 1 1 0 2 1 0 3 1 0 4 1 1 5 1 1 7 more rows ... $dispersion.used [1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508 $comparison [1] "SeasonR" "HydroperiodP:SeasonR" > > RemoveSeasonRmHydBySeason = topTags(lrt.coef34, number =25060) Error in topTags(lrt.coef34, number = 25060) : unused argument(s) (number = 25060) > ?topTags > RemoveSeasonRmHydBySeason = topTags(lrt.coef34, n =25060) > ls() [1] "Hydroperiod" "RemoveSeasonRmHydBySeason" [3] "Season" "d" [5] "d.GLM" "design" [7] "glmfit.tgw" "lrt.coef34" [9] "lrt.coef4" "lrt.tgw" [11] "oo" "raw.data" > RemoveHydBySeason = topTags(lrt.coef4, n=25060) > write.csv(RemoveHydBySeason, "RemoveHydBySeason.csv") > write.csv(RemoveSeasonRmHydBySeason, "RemoveSeasonRmHydBySeason.csv") > summary(decideTestsDGE(lrt.coef4) + > summary(decideTestsDGE(lrt.coef4)) [,1] -1 57 0 24887 1 116 > summary(decideTestsDGE(lrt.coef34)) Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : attempt to set an attribute on NULL > design (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR 1 1 0 1 0 2 1 0 1 0 3 1 0 1 0 4 1 1 1 1 5 1 1 1 1 6 1 1 1 1 7 1 0 0 0 8 1 0 0 0 9 1 0 0 0 10 1 1 0 0 11 1 1 0 0 12 1 1 0 0 attr(,"assign") [1] 0 1 2 3 attr(,"contrasts") attr(,"contrasts")$Hydroperiod [1] "contr.treatment" attr(,"contrasts")$Season [1] "contr.treatment"
ADD COMMENTlink written 6.8 years ago by Smith, Hilary A40
0
gravatar for Gordon Smyth
6.8 years ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:
Hi Avinash, Your analysis seems fine, from what you give, although I'd rather than you used estimateTagwiseDisp() as a general rule instead of just estimateCommonDisp(). One of the major motivations for the edgeR package is the ability to allow gene-specific variability without losing too much power, which is what the tagwise dispersion estimation does. We also generally recommend normalization using calcNormFactors() prior to dispersion estimation and testing. Either or both of these steps may affect the number of DE genes. Best wishes Gordon > Date: Mon, 28 Nov 2011 09:58:48 -0600 > From: Avinash S <avins.s at="" googlemail.com=""> > To: <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] edgeR - R script - results compared to DESeq > > Hello List, > > I'm just starting with R and wanted to analyze my data for differential > expression using edgeR. Here is the code which is working for me but I want > to check if I'm missing something as I get more number of differentially > expressed genes compared to DESeq > > MY Sample data > > GeneID CR1 CR2 MR1 MR2 > 3119s00200.1 78 78 148 124 > 3119s00202.1 33 68 168 198 > 3119s00232.1 52 73 135 99 > > CR1 and CR2 are replicates of controls and MR1 and MR2 are treated sample > replicates > > library(edgeR) > library(limma) > raw.data <- read.delim("RNAseq-MycReadCountData-EdgeR.txt") > names(raw.data) > > d <- raw.data[, 2:5] > rownames(d) <- raw.data[, 1] > group <- c(rep("CR", 2), rep("MR", 2)) > d <- DGEList(counts = d, group = group) > d <- estimateCommonDisp(d) > et <- exactTest(d) > topTags(et) > etTag = rownames(topTags(et)$table) > sum(et$table$p.value <0.05) > sum(p.adjust(et$table$p.value,method="BH") < 0.1) > good = sum(et$table$p.value <0.05) > goodList = topTags(et, n=good) > sink("edgeR-MR-pVal005.csv") > goodList > sink() > > I compared the result with DESeq and I get about 2000 genes more in edgeR > at pVal < 0.05, however, the the matched genes showed same log2foldchanges. > Is it usual that edgeR gives more number of diff.expr genes? > > Thank you for your time, > Avinash ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENTlink written 6.8 years ago by Gordon Smyth35k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 121 users visited in the last hour