which universe in hyperGTest

0

Entering edit mode

marco zucchelli ▴ 320

@marco-zucchelli-1987

Last seen 11.3 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070111/ 9a46a3db/attachment.pl

• 1.1k views

ADD COMMENT • link updated 18.9 years ago by Seth Falcon ★ 7.4k • written 18.9 years ago by marco zucchelli ▴ 320

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 21 hours ago

United States

marco zucchelli wrote: > Hi, > > I am using hyperGTest to test GO. Fot the universe of genes I use the > follwing: > > ENTREZ <- as.list(hgu133plus2ENTREZID) > univ <- unlist(ENTREZ) > univ <- univ[!is.na(univ)] > > now in univ there are 47,430 genes but only 19,871 are unique, since the > same gene can be hybrydized several times on the same array. > > >>length(univ) > > [1] 47430 > >>length(unique(univ)) > > [1] 19871 > > > Is it correct to have repetitions or should a list of unique genes be used? > i.e. should I use: > > universeGeneIds=univ or > universeGeneIds=unique(univ) You want the unique genes. In addition, if you have done any pre-filtering of the data to remove e.g., those genes that don't change expression, you want to remove those from your universe as well. The universe should only consist of unique genes that could have been selected by whatever statistical test you used. BTW, the geneIds should also be unique. Seth has made some changes to the GOstats vignette that should make all of this quite clear: http://www.bioconductor.org/packages/2.0/bioc/vignettes/GOstats/inst/d oc/GOstatsHyperG.pdf Best, Jim > > in the GOHyperGParams ?? > > Regards > > Marco > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 18.9 years ago James W. MacDonald 68k

0

Entering edit mode

Seth Falcon ★ 7.4k

@seth-falcon-992

Last seen 11.3 years ago

"James W. MacDonald" <jmacdon at="" med.umich.edu=""> writes: > You can test for under-represented GO terms by setting the testDirection > argument of your GOHyperGParams object to "under". So if your > GOHyperGParams object were set up like this: > > params <- new("GOHyperGParams", geneIds = A, geneUniverseIds = ALL) > > then you would test for over representation as normal: > > hyperGTest(p) > > and under-representation like this: > > testDirection(params) <- "under" make that: testDirection(p) <- "under" and then: > hyperGTest(p) I knew what you meant, but in case someone thinks "params" is a magic word... :-) + seth

ADD COMMENT • link 18.9 years ago Seth Falcon ★ 7.4k

0

Entering edit mode

Seth Falcon wrote: > "James W. MacDonald" <jmacdon at="" med.umich.edu=""> writes: > >>You can test for under-represented GO terms by setting the testDirection >>argument of your GOHyperGParams object to "under". So if your >>GOHyperGParams object were set up like this: >> >>params <- new("GOHyperGParams", geneIds = A, geneUniverseIds = ALL) >> >>then you would test for over representation as normal: >> >>hyperGTest(p) >> >>and under-representation like this: >> >>testDirection(params) <- "under" > > > make that: testDirection(p) <- "under" > > and then: > > >>hyperGTest(p) > > > > I knew what you meant, but in case someone thinks "params" is a magic > word... This just goes to show the power of the vignette. I usually call my GOHyperGParams objects 'params', but reverted to what you call it in your vignette, right in the middle of my example. Now if I could just get my computer to be able to figure out what I mean when I mix things up like that. ;-D Best, Jim > > :-) > > + seth > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD REPLY • link 18.9 years ago James W. MacDonald 68k

0

Entering edit mode

marco zucchelli ▴ 320

@marco-zucchelli-1987

Last seen 11.3 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070111/ 9830e971/attachment.pl

ADD COMMENT • link 18.9 years ago marco zucchelli ▴ 320

0

Entering edit mode

marco zucchelli wrote: > Sorry, I missed to reply to bioconductor ... Thanks for re-sending to the list. > > Dear James, > > actually I tried both the options and got the same result, which probably > menas that the hyperGTest is taking care of the repeats. True. It does that now. In addition, in the new version it does some validity checking to ensure that e.g., the geneId is a subset of the universeGeneIds. > > I use hgu133plus2. > > The pre-filtering is pretty interesting argument. In my case I have 2 > biological replicates for each tissue and I split the probes based on the > following rule: Here I am assuming that you mean probesets when you say probes. > > 1.probes that are called absent on all the arrays (group A) > 2.probes that have consistent calls on the biological replicates (i.e. they > are either present or moderate or absent on both the replicates) but at > least once they are present on a pair of replicates (group B) > 3. others. these are probes that may be absent on one replicate and present > on the other etc. (group C) > > now I find differentially expressed gened with limma in group B and I > cluster them according to the pattern of up-down regulation they have on > the > different tissues. > > I am intrested also in genes that are always absent (group A) since this > may > be biologically of interest in my experiment. > > What is now a good universe to use? > > I filtered out probes that have no entrezID both in my gene lists and the > hgu133plus2 array, I got the entrezId's and eliminated the doubles by using > unique(). > > Then I used: > > 1. geneIds = A , universeGenes = ALL but seems like I > should use universeGenes = A+B ?? For this group, the statistic you used to select 'significant' probesets was the P/A calls, and you selected from all the genes on the chip, so I think your universe is correct. > 2. geneIds = cluster of B , universeGenes = ALL but seems like I should > use universeGenes = A+B ?? or only B? Here you pre-filtered the probesets and selected only those that fulfilled your 'B' criterion. You then used a linear model to select differentially expressed genes from this subset of the chip, so IMO, your universe is the 'B' genes. > > Does it make sense to look at unexpressed genes ? seems like this might be > not very wise on the affy arrays. I think you might get ten different answers from ten different statisticians if you asked them about what it means when a probeset has an 'absent' call. I am personally not convinced that the P/M/A calls mean much, mainly because a large percentage (30 - 40%) of the MM probes are brighter than their matching PM probe. It seems to me that the MM probes capture an unknown mixture of noise, transcript abundance, background binding, and binding of unexpected transcript (e.g., an unrelated transcript for which the MM probe is a PM that Affy didn't know about). The assumption for the P/M/A calls is that the MM probe only captures noise and background binding, so if the PM probes are significantly brighter, then the transcript is expressed. In many cases this may well be true. However, I think there are likely enough cases where this isn't true that I am not comfortable assuming that an absent call means the gene really isn't being expressed. That said, by looking at the 'unexpressed' genes are you really trying to figure out which pathways (for lack of a better term) are being shut down, or not used? If that is the case, then maybe you could use the B genes and look for _underrepresented_ GO terms as well. This might give you what you want without having to make any assumptions about which genes are not being expressed. You can test for under-represented GO terms by setting the testDirection argument of your GOHyperGParams object to "under". So if your GOHyperGParams object were set up like this: params <- new("GOHyperGParams", geneIds = A, geneUniverseIds = ALL) then you would test for over representation as normal: hyperGTest(p) and under-representation like this: testDirection(params) <- "under" hyperGTest(p) HTH, Jim -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD REPLY • link 18.9 years ago James W. MacDonald 68k

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070112/ 2afb1261/attachment.pl

ADD REPLY • link 18.9 years ago marco zucchelli ▴ 320

Login before adding your answer.