Extract microarray data for genes identified by GO analysis

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Dear Gurus, I am doing an Illumina microarray analysis. The study design is a 2x2 (i.e. varying on two different conditions). As part of the analysis I'm doing a GO analysis. There are a few GO categories of special interest, so I want to extract data for the probes identified in these categories and cluster the data. The problem is that after performing the GO analysis, I essentially cannot figure out how to extract the data for these probes. I have done lots of googling and have figured out that "geneIdsByCategory" (e.g. geneIdsByCategory(mfOver1)[["GO:0001077"]]) will tell me the EntrezIDs for the genes, but I cannot figure out how to map those back to the probeIDs. I also came across "probeSetSummary," which maps between EntrezID and ProbeID, but the data from this method does not seem to match that from "geneIdsByCategory." Specifically, the number of unique EntrezIDs in each GO category are different. Here is some example output (only showing results from one GO category): >head(probeSetSummary(mfOver1,.05,sigProbesets=sigLL1)) $`GO:0001077` EntrezID ProbeSetID selected 1 16600 0khLe85Huv0juQw.sQ 0 2 16600 35LRC1Xd1PNCJ05Ras 0 3 16600 rpUHFdf15SFI5LRC1U 0 4 18124 BteTYfS5fYo.qi6dh0 0 5 18124 TnIofrF1F97TYQnfX4 0 6 18124 rQIi6KJzkUI0QknwKE 0 7 21420 NVwtViinW54gHvi7Eg 0 8 21420 NZWVEWR3oXld_i3_4c 0 9 21420 xvEFZWVEWR3oXld_i0 0 >geneIdsByCategory(mfOver1)[["GO:0001077"]] [1] "13653" "16600" "18124" "21420" "22038" Can anyone give me guidance on how to get from the GO analysis to clustering? I know how to cluster, but getting from EntrezIDs back to probeIDs is my problem. Well, I think that's my problem anyway. If you know of a better way to do it, I'd love to hear it! Thanks in advance! Mark -- output of sessionInfo(): R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GO.db_2.8.0 GOstats_2.24.0 graph_1.36.1 Category_2.24.0 limma_3.14.1 annotate_1.36.0 lumiMouseAll.db_1.18.0 org.Mm.eg.db_2.8.0 [9] RSQLite_0.11.2 DBI_0.2-5 AnnotationDbi_1.20.3 xtable_1.7-0 lumi_2.10.0 nleqslv_1.9.4 Biobase_2.18.0 BiocGenerics_0.4.0 [17] vimcom_0.9-5 setwidth_1.0-2 lattice_0.20-10 loaded via a namespace (and not attached): [1] affy_1.36.0 affyio_1.26.0 AnnotationForge_1.0.2 BiocInstaller_1.8.3 colorspace_1.2-0 genefilter_1.40.0 grid_2.15.1 GSEABase_1.20.0 [9] IRanges_1.16.4 KernSmooth_2.23-8 MASS_7.3-22 Matrix_1.0-10 methylumi_2.4.0 mgcv_1.7-22 nlme_3.1-105 parallel_2.15.1 [17] preprocessCore_1.20.0 RBGL_1.34.0 splines_2.15.1 stats4_2.15.1 survival_2.36-14 tcltk_2.15.1 tools_2.15.1 XML_3.95-0.1 [25] zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.

Microarray GO Category Microarray GO Category • 1.1k views

ADD COMMENT • link updated 11.2 years ago by James W. MacDonald 65k • written 11.2 years ago by Guest User ★ 13k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 3 hours ago

United States

Hi Mark, On 2/18/2013 3:13 PM, Mark [guest] wrote: > Dear Gurus, > > I am doing an Illumina microarray analysis. The study design is a 2x2 (i.e. varying on two different conditions). As part of the analysis I'm doing a GO analysis. There are a few GO categories of special interest, so I want to extract data for the probes identified in these categories and cluster the data. > > The problem is that after performing the GO analysis, I essentially cannot figure out how to extract the data for these probes. I have done lots of googling and have figured out that "geneIdsByCategory" (e.g. geneIdsByCategory(mfOver1)[["GO:0001077"]]) will tell me the EntrezIDs for the genes, but I cannot figure out how to map those back to the probeIDs. > > I also came across "probeSetSummary," which maps between EntrezID and ProbeID, but the data from this method does not seem to match that from "geneIdsByCategory." Specifically, the number of unique EntrezIDs in each GO category are different. Here is some example output (only showing results from one GO category): > >> head(probeSetSummary(mfOver1,.05,sigProbesets=sigLL1)) > $`GO:0001077` > EntrezID ProbeSetID selected > 1 16600 0khLe85Huv0juQw.sQ 0 > 2 16600 35LRC1Xd1PNCJ05Ras 0 > 3 16600 rpUHFdf15SFI5LRC1U 0 > 4 18124 BteTYfS5fYo.qi6dh0 0 > 5 18124 TnIofrF1F97TYQnfX4 0 > 6 18124 rQIi6KJzkUI0QknwKE 0 > 7 21420 NVwtViinW54gHvi7Eg 0 > 8 21420 NZWVEWR3oXld_i3_4c 0 > 9 21420 xvEFZWVEWR3oXld_i0 0 This is what you want, but you didn't read the help page carefully. sigProbesets: Optional vector of probeset IDs. See details for more information. It appears you passed in the vector of unique Entrez Gene IDs (the geneIds), which is why you have all zeros in the selected column. If you pass in the probeset (or more correctly in your case, probe) IDs, you will have zeros and ones, and the ones indicate the probes that are significant. You may still want to subset to only a single Entrez Gene ID, as there is likely to be some information duplication between the probes that are supposed to interrogate the same transcript. > >> geneIdsByCategory(mfOver1)[["GO:0001077"]] > [1] "13653" "16600" "18124" "21420" "22038" This just gives you the Entrez Gene IDs that map to that particular category, AND are represented on your array. Best, Jim > > > Can anyone give me guidance on how to get from the GO analysis to clustering? I know how to cluster, but getting from EntrezIDs back to probeIDs is my problem. Well, I think that's my problem anyway. If you know of a better way to do it, I'd love to hear it! > > Thanks in advance! > > Mark > > -- output of sessionInfo(): > > R version 2.15.1 (2012-06-22) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GO.db_2.8.0 GOstats_2.24.0 graph_1.36.1 Category_2.24.0 limma_3.14.1 annotate_1.36.0 lumiMouseAll.db_1.18.0 org.Mm.eg.db_2.8.0 > [9] RSQLite_0.11.2 DBI_0.2-5 AnnotationDbi_1.20.3 xtable_1.7-0 lumi_2.10.0 nleqslv_1.9.4 Biobase_2.18.0 BiocGenerics_0.4.0 > [17] vimcom_0.9-5 setwidth_1.0-2 lattice_0.20-10 > > loaded via a namespace (and not attached): > [1] affy_1.36.0 affyio_1.26.0 AnnotationForge_1.0.2 BiocInstaller_1.8.3 colorspace_1.2-0 genefilter_1.40.0 grid_2.15.1 GSEABase_1.20.0 > [9] IRanges_1.16.4 KernSmooth_2.23-8 MASS_7.3-22 Matrix_1.0-10 methylumi_2.4.0 mgcv_1.7-22 nlme_3.1-105 parallel_2.15.1 > [17] preprocessCore_1.20.0 RBGL_1.34.0 splines_2.15.1 stats4_2.15.1 survival_2.36-14 tcltk_2.15.1 tools_2.15.1 XML_3.95-0.1 > [25] zlibbioc_1.4.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.2 years ago James W. MacDonald 65k

0

Entering edit mode

James, Thank you for pointing out my silly mistake. I think I'm on my way here. I also found I was using the geneCounts accessor method incorrectly. I was using it to count significant genes in each category, which is obviously incorrect. After looking more thoroughly, it looks like there isn't an accessor method. Is that correct? Perhaps the best method is to count the unique EntrezIDs from probeSetSummary that were "selected" (significant)? Thanks again! Mark On Feb 19, 2013, at 7:31 AM, "James W. MacDonald" <jmacdon at="" uw.edu=""> wrote: > Hi Mark, > > On 2/18/2013 3:13 PM, Mark [guest] wrote: >> Dear Gurus, >> >> I am doing an Illumina microarray analysis. The study design is a 2x2 (i.e. varying on two different conditions). As part of the analysis I'm doing a GO analysis. There are a few GO categories of special interest, so I want to extract data for the probes identified in these categories and cluster the data. >> >> The problem is that after performing the GO analysis, I essentially cannot figure out how to extract the data for these probes. I have done lots of googling and have figured out that "geneIdsByCategory" (e.g. geneIdsByCategory(mfOver1)[["GO:0001077"]]) will tell me the EntrezIDs for the genes, but I cannot figure out how to map those back to the probeIDs. >> >> I also came across "probeSetSummary," which maps between EntrezID and ProbeID, but the data from this method does not seem to match that from "geneIdsByCategory." Specifically, the number of unique EntrezIDs in each GO category are different. Here is some example output (only showing results from one GO category): >> >>> head(probeSetSummary(mfOver1,.05,sigProbesets=sigLL1)) >> $`GO:0001077` >> EntrezID ProbeSetID selected >> 1 16600 0khLe85Huv0juQw.sQ 0 >> 2 16600 35LRC1Xd1PNCJ05Ras 0 >> 3 16600 rpUHFdf15SFI5LRC1U 0 >> 4 18124 BteTYfS5fYo.qi6dh0 0 >> 5 18124 TnIofrF1F97TYQnfX4 0 >> 6 18124 rQIi6KJzkUI0QknwKE 0 >> 7 21420 NVwtViinW54gHvi7Eg 0 >> 8 21420 NZWVEWR3oXld_i3_4c 0 >> 9 21420 xvEFZWVEWR3oXld_i0 0 > > This is what you want, but you didn't read the help page carefully. > > sigProbesets: Optional vector of probeset IDs. See details for more > information. > > It appears you passed in the vector of unique Entrez Gene IDs (the geneIds), which is why you have all zeros in the selected column. If you pass in the probeset (or more correctly in your case, probe) IDs, you will have zeros and ones, and the ones indicate the probes that are significant. You may still want to subset to only a single Entrez Gene ID, as there is likely to be some information duplication between the probes that are supposed to interrogate the same transcript. > > > >> >>> geneIdsByCategory(mfOver1)[["GO:0001077"]] >> [1] "13653" "16600" "18124" "21420" "22038" > > This just gives you the Entrez Gene IDs that map to that particular category, AND are represented on your array. > > Best, > > Jim > > >> >> >> Can anyone give me guidance on how to get from the GO analysis to clustering? I know how to cluster, but getting from EntrezIDs back to probeIDs is my problem. Well, I think that's my problem anyway. If you know of a better way to do it, I'd love to hear it! >> >> Thanks in advance! >> >> Mark >> >> -- output of sessionInfo(): >> >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] GO.db_2.8.0 GOstats_2.24.0 graph_1.36.1 Category_2.24.0 limma_3.14.1 annotate_1.36.0 lumiMouseAll.db_1.18.0 org.Mm.eg.db_2.8.0 >> [9] RSQLite_0.11.2 DBI_0.2-5 AnnotationDbi_1.20.3 xtable_1.7-0 lumi_2.10.0 nleqslv_1.9.4 Biobase_2.18.0 BiocGenerics_0.4.0 >> [17] vimcom_0.9-5 setwidth_1.0-2 lattice_0.20-10 >> >> loaded via a namespace (and not attached): >> [1] affy_1.36.0 affyio_1.26.0 AnnotationForge_1.0.2 BiocInstaller_1.8.3 colorspace_1.2-0 genefilter_1.40.0 grid_2.15.1 GSEABase_1.20.0 >> [9] IRanges_1.16.4 KernSmooth_2.23-8 MASS_7.3-22 Matrix_1.0-10 methylumi_2.4.0 mgcv_1.7-22 nlme_3.1-105 parallel_2.15.1 >> [17] preprocessCore_1.20.0 RBGL_1.34.0 splines_2.15.1 stats4_2.15.1 survival_2.36-14 tcltk_2.15.1 tools_2.15.1 XML_3.95-0.1 >> [25] zlibbioc_1.4.0 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 >

ADD REPLY • link 11.2 years ago Mark Ebbert ▴ 30

0

Entering edit mode

Hi Mark, Certainly the easiest way to do it is to use probeSetSummary, as you can do it in just two lines of code: ps <- probeSetSummary(hypergobject) gns <- lapply(ps, function(x) unique(x[,1])) Best, Jim On 2/19/2013 1:38 PM, Mark Ebbert wrote: > James, > > Thank you for pointing out my silly mistake. I think I'm on my way here. I also found I was using the geneCounts accessor method incorrectly. I was using it to count significant genes in each category, which is obviously incorrect. After looking more thoroughly, it looks like there isn't an accessor method. Is that correct? > > Perhaps the best method is to count the unique EntrezIDs from probeSetSummary that were "selected" (significant)? > > Thanks again! > > Mark > On Feb 19, 2013, at 7:31 AM, "James W. MacDonald"<jmacdon at="" uw.edu=""> wrote: > >> Hi Mark, >> >> On 2/18/2013 3:13 PM, Mark [guest] wrote: >>> Dear Gurus, >>> >>> I am doing an Illumina microarray analysis. The study design is a 2x2 (i.e. varying on two different conditions). As part of the analysis I'm doing a GO analysis. There are a few GO categories of special interest, so I want to extract data for the probes identified in these categories and cluster the data. >>> >>> The problem is that after performing the GO analysis, I essentially cannot figure out how to extract the data for these probes. I have done lots of googling and have figured out that "geneIdsByCategory" (e.g. geneIdsByCategory(mfOver1)[["GO:0001077"]]) will tell me the EntrezIDs for the genes, but I cannot figure out how to map those back to the probeIDs. >>> >>> I also came across "probeSetSummary," which maps between EntrezID and ProbeID, but the data from this method does not seem to match that from "geneIdsByCategory." Specifically, the number of unique EntrezIDs in each GO category are different. Here is some example output (only showing results from one GO category): >>> >>>> head(probeSetSummary(mfOver1,.05,sigProbesets=sigLL1)) >>> $`GO:0001077` >>> EntrezID ProbeSetID selected >>> 1 16600 0khLe85Huv0juQw.sQ 0 >>> 2 16600 35LRC1Xd1PNCJ05Ras 0 >>> 3 16600 rpUHFdf15SFI5LRC1U 0 >>> 4 18124 BteTYfS5fYo.qi6dh0 0 >>> 5 18124 TnIofrF1F97TYQnfX4 0 >>> 6 18124 rQIi6KJzkUI0QknwKE 0 >>> 7 21420 NVwtViinW54gHvi7Eg 0 >>> 8 21420 NZWVEWR3oXld_i3_4c 0 >>> 9 21420 xvEFZWVEWR3oXld_i0 0 >> This is what you want, but you didn't read the help page carefully. >> >> sigProbesets: Optional vector of probeset IDs. See details for more >> information. >> >> It appears you passed in the vector of unique Entrez Gene IDs (the geneIds), which is why you have all zeros in the selected column. If you pass in the probeset (or more correctly in your case, probe) IDs, you will have zeros and ones, and the ones indicate the probes that are significant. You may still want to subset to only a single Entrez Gene ID, as there is likely to be some information duplication between the probes that are supposed to interrogate the same transcript. >> >> >> >>>> geneIdsByCategory(mfOver1)[["GO:0001077"]] >>> [1] "13653" "16600" "18124" "21420" "22038" >> This just gives you the Entrez Gene IDs that map to that particular category, AND are represented on your array. >> >> Best, >> >> Jim >> >> >>> >>> Can anyone give me guidance on how to get from the GO analysis to clustering? I know how to cluster, but getting from EntrezIDs back to probeIDs is my problem. Well, I think that's my problem anyway. If you know of a better way to do it, I'd love to hear it! >>> >>> Thanks in advance! >>> >>> Mark >>> >>> -- output of sessionInfo(): >>> >>> R version 2.15.1 (2012-06-22) >>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] GO.db_2.8.0 GOstats_2.24.0 graph_1.36.1 Category_2.24.0 limma_3.14.1 annotate_1.36.0 lumiMouseAll.db_1.18.0 org.Mm.eg.db_2.8.0 >>> [9] RSQLite_0.11.2 DBI_0.2-5 AnnotationDbi_1.20.3 xtable_1.7-0 lumi_2.10.0 nleqslv_1.9.4 Biobase_2.18.0 BiocGenerics_0.4.0 >>> [17] vimcom_0.9-5 setwidth_1.0-2 lattice_0.20-10 >>> >>> loaded via a namespace (and not attached): >>> [1] affy_1.36.0 affyio_1.26.0 AnnotationForge_1.0.2 BiocInstaller_1.8.3 colorspace_1.2-0 genefilter_1.40.0 grid_2.15.1 GSEABase_1.20.0 >>> [9] IRanges_1.16.4 KernSmooth_2.23-8 MASS_7.3-22 Matrix_1.0-10 methylumi_2.4.0 mgcv_1.7-22 nlme_3.1-105 parallel_2.15.1 >>> [17] preprocessCore_1.20.0 RBGL_1.34.0 splines_2.15.1 stats4_2.15.1 survival_2.36-14 tcltk_2.15.1 tools_2.15.1 XML_3.95-0.1 >>> [25] zlibbioc_1.4.0 >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.2 years ago James W. MacDonald 65k

0

Entering edit mode

Hi, I am using GOstats to identify molecular functions that are over represented. I'm getting conflicting results between a method from an example I found in the lumi vignette and using probeSetSummary. Specifically, to get the list of significant categories, I was using the following code: mfOver1 <- hyperGTest(params.mf1) mf.gGhyp.pv1 <- pvalues(mfOver1) mf.sigGO.ID.pv1 <- names(mf.gGhyp.pv1[mf.gGhyp.pv1 < 0.05]) mf.sigGO.Term.pv1 <- getGOTerm(mf.sigGO.ID.pv1)[["MF"]] length(mf.sigGO.Term.pv1) The number of resulting GO terms based on this code is 105. If I use probeSetSummary, however, I only get 92 significant GO terms. Here is the code I'm using: ps1<-probeSetSummary(mfOver1,0.05,sigProbesets=probeList) length(ps1) My understanding is that both methods should select only those categories with a p-value < 0.05, but I have no doubt misunderstood something. Thanks for your help! Mark

ADD REPLY • link 11.2 years ago Mark Ebbert ▴ 30

0

Entering edit mode

Hi Mark, On 2/19/2013 4:41 PM, Mark Ebbert wrote: > Hi, > > I am using GOstats to identify molecular functions that are over represented. I'm getting conflicting results between a method from an example I found in the lumi vignette and using probeSetSummary. Specifically, to get the list of significant categories, I was using the following code: > > mfOver1<- hyperGTest(params.mf1) > mf.gGhyp.pv1<- pvalues(mfOver1) > mf.sigGO.ID.pv1<- names(mf.gGhyp.pv1[mf.gGhyp.pv1< 0.05]) > mf.sigGO.Term.pv1<- getGOTerm(mf.sigGO.ID.pv1)[["MF"]] > length(mf.sigGO.Term.pv1) > > The number of resulting GO terms based on this code is 105. If I use probeSetSummary, however, I only get 92 significant GO terms. Here is the code I'm using: > > ps1<-probeSetSummary(mfOver1,0.05,sigProbesets=probeList) > length(ps1) > > > > My understanding is that both methods should select only those categories with a p-value< 0.05, but I have no doubt misunderstood something. Or you made a mistake somewhere. If I run example("probeSetSummary") To get some faked up results; hyp, a HyperGTestResult object, and ps, the output from probeSetSummary. If I then do what you intended to do (noting that the example uses the default p-value of 0.01): > ps <- probeSetSummary(hyp, pvalue = 0.05) > length(ps) [1] 700 > sum(pvalues(hyp) < 0.05) [1] 700 > all.equal(names(pvalues(hyp)[pvalues(hyp) < 0.05]), names(ps)) [1] TRUE Seems the same to me. Best, Jim > > Thanks for your help! > > Mark > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.2 years ago James W. MacDonald 65k

0

Entering edit mode

Jim, Thank you for your patience. I realize there is only so much you can do without a reproducible example, but here's one bit of information that makes me a bit suspicious. If I just type "mfOver1" I get a summary of the object as follows stating that there are 105 categories with a p-value < 0.05: > mfOver1 Gene to GO MF test for over-representation 389 GO MF ids tested (105 have p < 0.05) Selected gene set size: 133 Gene universe size: 18105 Annotation package: lumiMouseAll If I then run the probeSetSummary using the same object, it says 92: > ps1<-probeSetSummary(mfOver1,sigProbesets=sigProbe1) > length(ps1) [1] 92 I tried using the entire (original) probe set and the expected significant probe set and the number stays at 92. I also tried specifying "pvalue=0.05" as previously stated without any difference (since that was the original parameter). I'd be happy to discover I made a silly mistake, but don't the above results seem suspicious? Given I didn't run any code between the two examples, it's hard for me to imagine that code run prior could cause this discrepancy. But I've been wrong before... On Feb 19, 2013, at 3:08 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Mark, > > On 2/19/2013 4:41 PM, Mark Ebbert wrote: >> Hi, >> >> I am using GOstats to identify molecular functions that are over represented. I'm getting conflicting results between a method from an example I found in the lumi vignette and using probeSetSummary. Specifically, to get the list of significant categories, I was using the following code: >> >> mfOver1<- hyperGTest(params.mf1) >> mf.gGhyp.pv1<- pvalues(mfOver1) >> mf.sigGO.ID.pv1<- names(mf.gGhyp.pv1[mf.gGhyp.pv1< 0.05]) >> mf.sigGO.Term.pv1<- getGOTerm(mf.sigGO.ID.pv1)[["MF"]] >> length(mf.sigGO.Term.pv1) >> >> The number of resulting GO terms based on this code is 105. If I use probeSetSummary, however, I only get 92 significant GO terms. Here is the code I'm using: >> >> ps1<-probeSetSummary(mfOver1,0.05,sigProbesets=probeList) >> length(ps1) >> >> >> >> My understanding is that both methods should select only those categories with a p-value< 0.05, but I have no doubt misunderstood something. > > Or you made a mistake somewhere. If I run > > example("probeSetSummary") > > To get some faked up results; hyp, a HyperGTestResult object, and ps, the output from probeSetSummary. If I then do what you intended to do (noting that the example uses the default p-value of 0.01): > > > ps <- probeSetSummary(hyp, pvalue = 0.05) > > length(ps) > [1] 700 > > sum(pvalues(hyp) < 0.05) > [1] 700 > > all.equal(names(pvalues(hyp)[pvalues(hyp) < 0.05]), names(ps)) > [1] TRUE > > Seems the same to me. > > Best, > > Jim > >> >> Thanks for your help! >> >> Mark >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 >

ADD REPLY • link 11.2 years ago Mark Ebbert ▴ 30

Login before adding your answer.