Question

difference in results using different edgeR package versions

0

Entering edit mode

Fernando Biase ▴ 20

@fernando-biase-6073

Last seen 11.3 years ago

Dear all, I did a RNA-seq analysis two years ago (2011), using edgeR package (I do not remember the version). I was asked to analyze the data again, and now I am using edgeR_3.2.4. The comparison is between two groups. The "backbone" of the code was very similar (it is pasted bellow), and I just changed it to adapt the package modifications. It turned out that the number of differentially expressed genes (FDR<0.05) in the second analysis (78) was a smaller than the number of DEG obtained in the first analysis (119). I understand that two years is a lot of time in such fast evolving field, and I suppose that the newer package version has more reliable analysis than older version. Nonetheless, has anyone experienced something similar? Is there a clear reason for this (other than I am making a mistake)? I also would appreciate if someone told me the mistakes in the code, if there are mistakes. Best regards, Fernando _____________________________________________ Genomics Research Associate Dept Bioengineering, University of California San Diego _____________________________________________ the design is the following: (Intercept) groupAB 7552_34_C 1 0 7828_34_C 1 0 8153_34_C 1 0 8140_34_C 1 0 8035_34_C 1 0 E002_34_C 1 1 E003_34_C 1 1 B967_34_C 1 1 B987_34_C 1 1 O621_34_C 1 1 O604_34_C 1 1 O470_34_C 1 1 O143_34_C 1 1 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$group [1] "contr.treatment" *from 2011* read_count_data_b_TMM <- calcNormFactors(read_count_data_b, method="TMM") design <- model.matrix(~ group , data=targets) read_count_data_c_TMM <-estimateGLMCommonDisp(read_count_data_b_TMM, design, method="CoxReid") read_count_data_c_TMM <-estimateGLMTrendedDisp(read_count_data_c_TMM, design, method="bin.spline") read_count_data_c_TMM <- estimateGLMTagwiseDisp(read_count_data_c_TMM, design, method="trend") glmfit_read_count_data_c_TMM <- glmFit(read_count_data_c_TMM, design, dispersion=read_count_data_c_TMM$tagwise.dispersion) lrt_glmfit_read_count_data_c_TMM.group <- glmLRT(read_count_data_c_TMM ,glmfit_read_count_data_c_TMM, coef=2) decideTests_read_count_data_c_TMM.group <- decideTestsDGE(lrt_glmfit_read_count_data_c_TMM.group) summary(decideTests_read_count_data_c_TMM.group) [,1] -1 11 0 13022 1 108 *From 2013* read_count_data_b_TMM <- calcNormFactors(read_count_data_b, method="TMM") design <- model.matrix(~ group , data=targets ) read_count_data_c_TMM <-estimateGLMCommonDisp(read_count_data_b_TMM, design, method="CoxReid") read_count_data_c_TMM <-estimateGLMTrendedDisp(read_count_data_c_TMM, design, method="bin.spline") read_count_data_c_TMM <- estimateGLMTagwiseDisp(read_count_data_c_TMM, design, trend=TRUE) glmfit_read_count_data_c_TMM <- glmFit(read_count_data_c_TMM, design, dispersion=read_count_data_c_TMM$tagwise.dispersion) lrt_glmfit_read_count_data_c_TMM.group <- glmLRT(glmfit_read_count_data_c_TMM , coef=2) decideTests_read_count_data_c_TMM.group <- decideTestsDGE(lrt_glmfit_read_count_data_c_TMM.group) summary(decideTests_read_count_data_c_TMM.group) [,1] -1 11 0 13063 1 67 the design is the following: (Intercept) groupAB 7552_34_C 1 0 7828_34_C 1 0 8153_34_C 1 0 8140_34_C 1 0 8035_34_C 1 0 E002_34_C 1 1 E003_34_C 1 1 B967_34_C 1 1 B987_34_C 1 1 O621_34_C 1 1 O604_34_C 1 1 O470_34_C 1 1 O143_34_C 1 1 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$group [1] "contr.treatment" The current sessionInfo() is the following: > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines grid parallel stats graphics grDevices utils datasets methods base other attached packages: [1] baySeq_1.14.1 VennDiagram_1.6.4 DESeq_1.12.0 lattice_0.20-15 locfit_1.5-9.1 Biobase_2.20.1 BiocGenerics_0.6.0 edgeR_3.2.4 [9] limma_3.16.6 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 genefilter_1.42.0 geneplotter_1.38.0 IRanges_1.18.2 RColorBrewer_1.0-5 [8] RSQLite_0.11.4 stats4_3.0.1 survival_2.37-4 tools_3.0.1 XML_3.95-0.2 xtable_1.7-1 [[alternative HTML version deleted]]

edgeR edgeR • 838 views

ADD COMMENT • link updated 12.4 years ago by Steve Lianoglou ★ 13k • written 12.4 years ago by Fernando Biase ▴ 20

score 0 · Answer 1 · 2013-08-02

Hi, On Fri, Aug 2, 2013 at 7:32 AM, Fernando Biase <fernandobiase at="" gmail.com=""> wrote: > Dear all, > I did a RNA-seq analysis two years ago (2011), using edgeR package (I do > not remember the version). I was asked to analyze the data again, and now I > am using edgeR_3.2.4. > > The comparison is between two groups. The "backbone" of the code was very > similar (it is pasted bellow), and I just changed it to adapt the package > modifications. It turned out that the number of differentially expressed > genes (FDR<0.05) in the second analysis (78) was a smaller than the number > of DEG obtained in the first analysis (119). > > I understand that two years is a lot of time in such fast evolving field, > and I suppose that the newer package version has more reliable analysis > than older version. Nonetheless, has anyone experienced something similar? > Is there a clear reason for this (other than I am making a mistake)? This topic comes up every now and again, and I believe it is usually due to some changes to the defaults for the prior.df (or prior.n) params somewhere. I'm sure that searching the archives (in the footer of each bioc mail is the link to search the archives via gmane) will help to shed more light on this, and perhaps this email thread would be a good place to start: http://thread.gmane.org/gmane.science.biology.informatics.conductor/46 826 HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech