Unexpected results of differential expression analysis
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.7 years ago
Hello, I am analysing the GEO dataset GSE19736 using SAM (significance analysis for microarrays), particularly the R package called samr but I am not getting the results that I was expecting. According to the published study, which also uses this tool, there should be 1028 differentially expressed genes (554 up-regulated and 474 down-regulated). When I run the analysis on the data I get a lot more of genes that are differentially expressed. I don't know what I might be doing wrong or where the difference lays. I am using the following code: #Extracting files >cel <- list.celfiles() >abatch.raw <- read.celfiles(cel) #Processing >geneSummaries <- rma(abatch.raw) #Extracting expression matrix >expressionmatrix <- exprs (geneSummaries) #SAM >samrobj <- samr (data, resp.type="Quantitative", nperms=50, center.arrays=TRUE, assay.type="array") >delta=2 >samr.plot(samrobj,delta) >delta.table <- samr.compute.delta.table(samrobj) >siggenes.table<-samr.compute.siggenes.table(samrobj,2.5, data, delta.table, min.foldchange=1.5, compute.localfdr=TRUE) >samr.pvalues.from.perms (samrobj$tt, samrobj$ttstar) If I understood it correctly you can know the number of differentially expressed genes this way for the upregulated: > siggenes.table$ngenes.up and this way for the downregulated: > siggenes.table$ngenes.lo I find there are 1598 upregulated genes and 1721 downregulated genes, and the number varies greatly depending on the value I give to delta. I tried assesing differential expression with limma instead, in this case I found that the number of differentially expressed genes was half the expected... Does anyone have any clue? Thanks! -- output of sessionInfo(): R version 2.15.1 (2012-06-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8 [4] LC_COLLATE=es_ES.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] compiler splines parallel stats graphics grDevices utils datasets methods [10] base other attached packages: [1] limma_3.14.4 pd.hugene.1.0.st.v1_3.8.0 GOstats_2.26.0 [4] Category_2.26.0 GSEABase_1.22.0 graph_1.38.2 [7] annaffy_1.32.0 KEGG.db_2.9.1 GO.db_2.9.0 [10] preprocessCore_1.20.0 samr_2.0 matrixStats_0.8.1 [13] impute_1.34.0 pdInfoBuilder_1.22.0 affxparser_1.30.2 [16] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 oligo_1.22.0 [19] oligoClasses_1.20.0 nnet_7.3-4 mgcv_1.7-18 [22] Matrix_1.0-6 lattice_0.20-6 KernSmooth_2.23-8 [25] gcrma_2.30.0 affy_1.36.1 foreign_0.8-50 [28] DBI_0.2-7 cluster_1.14.2 survival_2.36-14 [31] rpart_3.1-54 BiocInstaller_1.8.3 annotate_1.38.0 [34] AnnotationDbi_1.22.6 Biobase_2.18.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] affyio_1.26.0 AnnotationForge_1.2.1 Biostrings_2.26.3 bit_1.1-10 [5] codetools_0.2-8 ff_2.2-11 foreach_1.4.1 genefilter_1.42.0 [9] GenomicRanges_1.10.7 grid_2.15.1 IRanges_1.16.6 iterators_1.0.6 [13] nlme_3.1-104 RBGL_1.36.2 R.methodsS3_1.4.2 rstudio_0.97.246 [17] stats4_2.15.1 tools_2.15.1 XML_3.96-1.1 xtable_1.7-1 [21] zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.
GO limma siggenes GO limma siggenes • 1.2k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 23 days ago
EMBL European Molecular Biology Laborat…
Dear Laura did you already contact the authors of that paper for a transcript of their analysis / the exact parameters, software versions, filters, etc. used? Best wishes Wolfgang On 22 Jun 2013, at 13:06, Laura [guest] <guest at="" bioconductor.org=""> wrote: > > Hello, > > I am analysing the GEO dataset GSE19736 using SAM (significance analysis for microarrays), particularly the R package called samr but I am not getting the results that I was expecting. > > According to the published study, which also uses this tool, there should be 1028 differentially expressed genes (554 up-regulated and 474 down-regulated). When I run the analysis on the data I get a lot more of genes that are differentially expressed. I don't know what I might be doing wrong or where the difference lays. > > I am using the following code: > #Extracting files >> cel <- list.celfiles() >> abatch.raw <- read.celfiles(cel) > > #Processing >> geneSummaries <- rma(abatch.raw) > > #Extracting expression matrix >> expressionmatrix <- exprs (geneSummaries) > > #SAM >> samrobj <- samr (data, resp.type="Quantitative", nperms=50, center.arrays=TRUE, assay.type="array") >> delta=2 >> samr.plot(samrobj,delta) >> delta.table <- samr.compute.delta.table(samrobj) >> siggenes.table<-samr.compute.siggenes.table(samrobj,2.5, data, delta.table, min.foldchange=1.5, compute.localfdr=TRUE) >> samr.pvalues.from.perms (samrobj$tt, samrobj$ttstar) > > > If I understood it correctly you can know the number of differentially expressed genes this way for the upregulated: >> siggenes.table$ngenes.up > > and this way for the downregulated: >> siggenes.table$ngenes.lo > > I find there are 1598 upregulated genes and 1721 downregulated genes, and the number varies greatly depending on the value I give to delta. > > I tried assesing differential expression with limma instead, in this case I found that the number of differentially expressed genes was half the expected... > > Does anyone have any clue? > Thanks! > > -- output of sessionInfo(): > > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8 > [4] LC_COLLATE=es_ES.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8 > [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] compiler splines parallel stats graphics grDevices utils datasets methods > [10] base > > other attached packages: > [1] limma_3.14.4 pd.hugene.1.0.st.v1_3.8.0 GOstats_2.26.0 > [4] Category_2.26.0 GSEABase_1.22.0 graph_1.38.2 > [7] annaffy_1.32.0 KEGG.db_2.9.1 GO.db_2.9.0 > [10] preprocessCore_1.20.0 samr_2.0 matrixStats_0.8.1 > [13] impute_1.34.0 pdInfoBuilder_1.22.0 affxparser_1.30.2 > [16] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 oligo_1.22.0 > [19] oligoClasses_1.20.0 nnet_7.3-4 mgcv_1.7-18 > [22] Matrix_1.0-6 lattice_0.20-6 KernSmooth_2.23-8 > [25] gcrma_2.30.0 affy_1.36.1 foreign_0.8-50 > [28] DBI_0.2-7 cluster_1.14.2 survival_2.36-14 > [31] rpart_3.1-54 BiocInstaller_1.8.3 annotate_1.38.0 > [34] AnnotationDbi_1.22.6 Biobase_2.18.0 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] affyio_1.26.0 AnnotationForge_1.2.1 Biostrings_2.26.3 bit_1.1-10 > [5] codetools_0.2-8 ff_2.2-11 foreach_1.4.1 genefilter_1.42.0 > [9] GenomicRanges_1.10.7 grid_2.15.1 IRanges_1.16.6 iterators_1.0.6 > [13] nlme_3.1-104 RBGL_1.36.2 R.methodsS3_1.4.2 rstudio_0.97.246 > [17] stats4_2.15.1 tools_2.15.1 XML_3.96-1.1 xtable_1.7-1 > [21] zlibbioc_1.4.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6