Unexpected results of differential expression analysis

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.7 years ago

Hello, I am analysing the GEO dataset GSE19736 using SAM (significance analysis for microarrays), particularly the R package called samr but I am not getting the results that I was expecting. According to the published study, which also uses this tool, there should be 1028 differentially expressed genes (554 up-regulated and 474 down-regulated). When I run the analysis on the data I get a lot more of genes that are differentially expressed. I don't know what I might be doing wrong or where the difference lays. I am using the following code: #Extracting files >cel <- list.celfiles() >abatch.raw <- read.celfiles(cel) #Processing >geneSummaries <- rma(abatch.raw) #Extracting expression matrix >expressionmatrix <- exprs (geneSummaries) #SAM >samrobj <- samr (data, resp.type="Quantitative", nperms=50, center.arrays=TRUE, assay.type="array") >delta=2 >samr.plot(samrobj,delta) >delta.table <- samr.compute.delta.table(samrobj) >siggenes.table<-samr.compute.siggenes.table(samrobj,2.5, data, delta.table, min.foldchange=1.5, compute.localfdr=TRUE) >samr.pvalues.from.perms (samrobj$tt, samrobj$ttstar) If I understood it correctly you can know the number of differentially expressed genes this way for the upregulated: > siggenes.table$ngenes.up and this way for the downregulated: > siggenes.table$ngenes.lo I find there are 1598 upregulated genes and 1721 downregulated genes, and the number varies greatly depending on the value I give to delta. I tried assesing differential expression with limma instead, in this case I found that the number of differentially expressed genes was half the expected... Does anyone have any clue? Thanks! -- output of sessionInfo(): R version 2.15.1 (2012-06-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8 [4] LC_COLLATE=es_ES.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] compiler splines parallel stats graphics grDevices utils datasets methods [10] base other attached packages: [1] limma_3.14.4 pd.hugene.1.0.st.v1_3.8.0 GOstats_2.26.0 [4] Category_2.26.0 GSEABase_1.22.0 graph_1.38.2 [7] annaffy_1.32.0 KEGG.db_2.9.1 GO.db_2.9.0 [10] preprocessCore_1.20.0 samr_2.0 matrixStats_0.8.1 [13] impute_1.34.0 pdInfoBuilder_1.22.0 affxparser_1.30.2 [16] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 oligo_1.22.0 [19] oligoClasses_1.20.0 nnet_7.3-4 mgcv_1.7-18 [22] Matrix_1.0-6 lattice_0.20-6 KernSmooth_2.23-8 [25] gcrma_2.30.0 affy_1.36.1 foreign_0.8-50 [28] DBI_0.2-7 cluster_1.14.2 survival_2.36-14 [31] rpart_3.1-54 BiocInstaller_1.8.3 annotate_1.38.0 [34] AnnotationDbi_1.22.6 Biobase_2.18.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] affyio_1.26.0 AnnotationForge_1.2.1 Biostrings_2.26.3 bit_1.1-10 [5] codetools_0.2-8 ff_2.2-11 foreach_1.4.1 genefilter_1.42.0 [9] GenomicRanges_1.10.7 grid_2.15.1 IRanges_1.16.6 iterators_1.0.6 [13] nlme_3.1-104 RBGL_1.36.2 R.methodsS3_1.4.2 rstudio_0.97.246 [17] stats4_2.15.1 tools_2.15.1 XML_3.96-1.1 xtable_1.7-1 [21] zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.

GO limma siggenes GO limma siggenes • 1.2k views

ADD COMMENT • link updated 10.9 years ago by Wolfgang Huber ★ 13k • written 10.9 years ago by Guest User ★ 13k

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 23 days ago

EMBL European Molecular Biology Laborat…

Dear Laura did you already contact the authors of that paper for a transcript of their analysis / the exact parameters, software versions, filters, etc. used? Best wishes Wolfgang On 22 Jun 2013, at 13:06, Laura [guest] <guest at="" bioconductor.org=""> wrote: > > Hello, > > I am analysing the GEO dataset GSE19736 using SAM (significance analysis for microarrays), particularly the R package called samr but I am not getting the results that I was expecting. > > According to the published study, which also uses this tool, there should be 1028 differentially expressed genes (554 up-regulated and 474 down-regulated). When I run the analysis on the data I get a lot more of genes that are differentially expressed. I don't know what I might be doing wrong or where the difference lays. > > I am using the following code: > #Extracting files >> cel <- list.celfiles() >> abatch.raw <- read.celfiles(cel) > > #Processing >> geneSummaries <- rma(abatch.raw) > > #Extracting expression matrix >> expressionmatrix <- exprs (geneSummaries) > > #SAM >> samrobj <- samr (data, resp.type="Quantitative", nperms=50, center.arrays=TRUE, assay.type="array") >> delta=2 >> samr.plot(samrobj,delta) >> delta.table <- samr.compute.delta.table(samrobj) >> siggenes.table<-samr.compute.siggenes.table(samrobj,2.5, data, delta.table, min.foldchange=1.5, compute.localfdr=TRUE) >> samr.pvalues.from.perms (samrobj$tt, samrobj$ttstar) > > > If I understood it correctly you can know the number of differentially expressed genes this way for the upregulated: >> siggenes.table$ngenes.up > > and this way for the downregulated: >> siggenes.table$ngenes.lo > > I find there are 1598 upregulated genes and 1721 downregulated genes, and the number varies greatly depending on the value I give to delta. > > I tried assesing differential expression with limma instead, in this case I found that the number of differentially expressed genes was half the expected... > > Does anyone have any clue? > Thanks! > > -- output of sessionInfo(): > > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8 > [4] LC_COLLATE=es_ES.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8 > [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] compiler splines parallel stats graphics grDevices utils datasets methods > [10] base > > other attached packages: > [1] limma_3.14.4 pd.hugene.1.0.st.v1_3.8.0 GOstats_2.26.0 > [4] Category_2.26.0 GSEABase_1.22.0 graph_1.38.2 > [7] annaffy_1.32.0 KEGG.db_2.9.1 GO.db_2.9.0 > [10] preprocessCore_1.20.0 samr_2.0 matrixStats_0.8.1 > [13] impute_1.34.0 pdInfoBuilder_1.22.0 affxparser_1.30.2 > [16] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 oligo_1.22.0 > [19] oligoClasses_1.20.0 nnet_7.3-4 mgcv_1.7-18 > [22] Matrix_1.0-6 lattice_0.20-6 KernSmooth_2.23-8 > [25] gcrma_2.30.0 affy_1.36.1 foreign_0.8-50 > [28] DBI_0.2-7 cluster_1.14.2 survival_2.36-14 > [31] rpart_3.1-54 BiocInstaller_1.8.3 annotate_1.38.0 > [34] AnnotationDbi_1.22.6 Biobase_2.18.0 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] affyio_1.26.0 AnnotationForge_1.2.1 Biostrings_2.26.3 bit_1.1-10 > [5] codetools_0.2-8 ff_2.2-11 foreach_1.4.1 genefilter_1.42.0 > [9] GenomicRanges_1.10.7 grid_2.15.1 IRanges_1.16.6 iterators_1.0.6 > [13] nlme_3.1-104 RBGL_1.36.2 R.methodsS3_1.4.2 rstudio_0.97.246 > [17] stats4_2.15.1 tools_2.15.1 XML_3.96-1.1 xtable_1.7-1 > [21] zlibbioc_1.4.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.9 years ago Wolfgang Huber ★ 13k

Login before adding your answer.