Question

GeneSetTest: which statistics and other measures can be used

0

Entering edit mode

r_1470 ▴ 20

@r_1470-4453

Last seen 9.6 years ago

Dear list, having read the description of GeneSetTest I understand that it tests whether a specified subset of genes have higher values of a test statistic than random expectation, using a permutation test. If the test statistic has positive and negative values it is treated as 't-like'; if it has only positive values it is treated as 'F-like'. My question is: is there any restriction on the type of statistic used in this analysis? If GeneSetTest employs a straightforward permutation test then the probability distribution of the statistic shouldn't matter, should it? Only whether it contains positive-only versus positive and negative values? To give a couple of specific examples: 1) The deviance is a very useful statistic in generalized linear modelling and maximum likelihood analysis - would there be any issue with using the deviance as the test statistic? 2) Any number of other 'statistics' that are not probability distributions commonly employed in hypothesis testing might be calculated from gene expression data, a simple example being log fold change. Could such a measure appropriately be used in GeneSetTest (in the sense that it wouldn't violate any of the assumptions required to produce unbiased p values)? Many thanks and best wishes Richard. [[alternative HTML version deleted]]

• 788 views

ADD COMMENT • link updated 13.3 years ago by Gordon Smyth 50k • written 13.3 years ago by r_1470 ▴ 20

score 0 · Answer 1 · 2011-01-28

Dear Richard, Yes, geneSetTest() is designed so that it can be used with any statistic of interest, exactly as you suggest. There is no need for the statistic to be a "test statistic" is the usual statistical sense. It can be anything that you want to rank the genes by. The null hypothesis is that the gene set is no more highly ranked than randomly selected sets of the same size. Best wishes Gordon > Date: Thu, 27 Jan 2011 11:25:50 +0000 (GMT) > From: r_1470 <r_1470 at="" yahoo.co.uk=""> > To: bioconductor at r-project.org > Subject: [BioC] GeneSetTest: which statistics and other measures can > be used > > Dear list, > > having read the description of GeneSetTest I understand that it tests whether a > specified subset of genes have higher values of a test statistic than random > expectation, using a permutation test. If the test statistic has positive and > negative values it is treated as 't-like'; if it has only positive values it is > treated as 'F-like'. > > My question is: is there any restriction on the type of statistic used in this > analysis? If GeneSetTest employs a straightforward permutation test then the > probability distribution of the statistic shouldn't matter, should it? Only > whether it contains positive-only versus positive and negative values? > > To give a couple of specific examples: > > 1) The deviance is a very useful statistic in generalized linear modelling and > maximum likelihood analysis - would there be any issue with using the deviance > as the test statistic? > > 2) Any number of other 'statistics' that are not probability distributions > commonly employed in hypothesis testing might be calculated from gene expression > data, a simple example being log fold change. Could such a measure appropriately > be used in GeneSetTest (in the sense that it wouldn't violate any of the > assumptions required to produce unbiased p values)? > > Many thanks and best wishes > > Richard. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}