difference in results using different edgeR package versions
1
0
Entering edit mode
@fernando-biase-6073
Last seen 10.1 years ago
Dear all, I did a RNA-seq analysis two years ago (2011), using edgeR package (I do not remember the version). I was asked to analyze the data again, and now I am using edgeR_3.2.4. The comparison is between two groups. The "backbone" of the code was very similar (it is pasted bellow), and I just changed it to adapt the package modifications. It turned out that the number of differentially expressed genes (FDR<0.05) in the second analysis (78) was a smaller than the number of DEG obtained in the first analysis (119). I understand that two years is a lot of time in such fast evolving field, and I suppose that the newer package version has more reliable analysis than older version. Nonetheless, has anyone experienced something similar? Is there a clear reason for this (other than I am making a mistake)? I also would appreciate if someone told me the mistakes in the code, if there are mistakes. Best regards, Fernando _____________________________________________ Genomics Research Associate Dept Bioengineering, University of California San Diego _____________________________________________ the design is the following: (Intercept) groupAB 7552_34_C 1 0 7828_34_C 1 0 8153_34_C 1 0 8140_34_C 1 0 8035_34_C 1 0 E002_34_C 1 1 E003_34_C 1 1 B967_34_C 1 1 B987_34_C 1 1 O621_34_C 1 1 O604_34_C 1 1 O470_34_C 1 1 O143_34_C 1 1 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$group [1] "contr.treatment" *from 2011* read_count_data_b_TMM <- calcNormFactors(read_count_data_b, method="TMM") design <- model.matrix(~ group , data=targets) read_count_data_c_TMM <-estimateGLMCommonDisp(read_count_data_b_TMM, design, method="CoxReid") read_count_data_c_TMM <-estimateGLMTrendedDisp(read_count_data_c_TMM, design, method="bin.spline") read_count_data_c_TMM <- estimateGLMTagwiseDisp(read_count_data_c_TMM, design, method="trend") glmfit_read_count_data_c_TMM <- glmFit(read_count_data_c_TMM, design, dispersion=read_count_data_c_TMM$tagwise.dispersion) lrt_glmfit_read_count_data_c_TMM.group <- glmLRT(read_count_data_c_TMM ,glmfit_read_count_data_c_TMM, coef=2) decideTests_read_count_data_c_TMM.group <- decideTestsDGE(lrt_glmfit_read_count_data_c_TMM.group) summary(decideTests_read_count_data_c_TMM.group) [,1] -1 11 0 13022 1 108 *From 2013* read_count_data_b_TMM <- calcNormFactors(read_count_data_b, method="TMM") design <- model.matrix(~ group , data=targets ) read_count_data_c_TMM <-estimateGLMCommonDisp(read_count_data_b_TMM, design, method="CoxReid") read_count_data_c_TMM <-estimateGLMTrendedDisp(read_count_data_c_TMM, design, method="bin.spline") read_count_data_c_TMM <- estimateGLMTagwiseDisp(read_count_data_c_TMM, design, trend=TRUE) glmfit_read_count_data_c_TMM <- glmFit(read_count_data_c_TMM, design, dispersion=read_count_data_c_TMM$tagwise.dispersion) lrt_glmfit_read_count_data_c_TMM.group <- glmLRT(glmfit_read_count_data_c_TMM , coef=2) decideTests_read_count_data_c_TMM.group <- decideTestsDGE(lrt_glmfit_read_count_data_c_TMM.group) summary(decideTests_read_count_data_c_TMM.group) [,1] -1 11 0 13063 1 67 the design is the following: (Intercept) groupAB 7552_34_C 1 0 7828_34_C 1 0 8153_34_C 1 0 8140_34_C 1 0 8035_34_C 1 0 E002_34_C 1 1 E003_34_C 1 1 B967_34_C 1 1 B987_34_C 1 1 O621_34_C 1 1 O604_34_C 1 1 O470_34_C 1 1 O143_34_C 1 1 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$group [1] "contr.treatment" The current sessionInfo() is the following: > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines grid parallel stats graphics grDevices utils datasets methods base other attached packages: [1] baySeq_1.14.1 VennDiagram_1.6.4 DESeq_1.12.0 lattice_0.20-15 locfit_1.5-9.1 Biobase_2.20.1 BiocGenerics_0.6.0 edgeR_3.2.4 [9] limma_3.16.6 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 genefilter_1.42.0 geneplotter_1.38.0 IRanges_1.18.2 RColorBrewer_1.0-5 [8] RSQLite_0.11.4 stats4_3.0.1 survival_2.37-4 tools_3.0.1 XML_3.95-0.2 xtable_1.7-1 [[alternative HTML version deleted]]
edgeR edgeR • 668 views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 19 months ago
United States
Hi, On Fri, Aug 2, 2013 at 7:32 AM, Fernando Biase <fernandobiase at="" gmail.com=""> wrote: > Dear all, > I did a RNA-seq analysis two years ago (2011), using edgeR package (I do > not remember the version). I was asked to analyze the data again, and now I > am using edgeR_3.2.4. > > The comparison is between two groups. The "backbone" of the code was very > similar (it is pasted bellow), and I just changed it to adapt the package > modifications. It turned out that the number of differentially expressed > genes (FDR<0.05) in the second analysis (78) was a smaller than the number > of DEG obtained in the first analysis (119). > > I understand that two years is a lot of time in such fast evolving field, > and I suppose that the newer package version has more reliable analysis > than older version. Nonetheless, has anyone experienced something similar? > Is there a clear reason for this (other than I am making a mistake)? This topic comes up every now and again, and I believe it is usually due to some changes to the defaults for the prior.df (or prior.n) params somewhere. I'm sure that searching the archives (in the footer of each bioc mail is the link to search the archives via gmane) will help to shed more light on this, and perhaps this email thread would be a good place to start: http://thread.gmane.org/gmane.science.biology.informatics.conductor/46 826 HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech
ADD COMMENT
0
Entering edit mode
Thank you Steve, I appreciate the tip on looking at the prior params and the email thread. Best, Fernando On Fri, Aug 2, 2013 at 9:25 AM, Steve Lianoglou <lianoglou.steve@gene.com>wrote: > Hi, > > On Fri, Aug 2, 2013 at 7:32 AM, Fernando Biase <fernandobiase@gmail.com> > wrote: > > Dear all, > > I did a RNA-seq analysis two years ago (2011), using edgeR package (I do > > not remember the version). I was asked to analyze the data again, and > now I > > am using edgeR_3.2.4. > > > > The comparison is between two groups. The "backbone" of the code was very > > similar (it is pasted bellow), and I just changed it to adapt the package > > modifications. It turned out that the number of differentially expressed > > genes (FDR<0.05) in the second analysis (78) was a smaller than the > number > > of DEG obtained in the first analysis (119). > > > > I understand that two years is a lot of time in such fast evolving field, > > and I suppose that the newer package version has more reliable analysis > > than older version. Nonetheless, has anyone experienced something > similar? > > Is there a clear reason for this (other than I am making a mistake)? > > This topic comes up every now and again, and I believe it is usually > due to some changes to the defaults for the prior.df (or prior.n) > params somewhere. > > I'm sure that searching the archives (in the footer of each bioc mail > is the link to search the archives via gmane) will help to shed more > light on this, and perhaps this email thread would be a good place to > start: > > http://thread.gmane.org/gmane.science.biology.informatics.conductor/ 46826 > > HTH, > -steve > > -- > Steve Lianoglou > Computational Biologist > Bioinformatics and Computational Biology > Genentech > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6