Entering edit mode
Dirk Repsilber
▴
10
@dirk-repsilber-3024
Last seen 9.6 years ago
Dear BioC Bioinformaticians,
I am using the package GSA for testing gene set enrichment in gene
expression data.
GSA uses a permutation test for calculating p-values of enrichment.
Such p-values are usually defined as
p=(#(T* >= T)) / #B
where T is the test statistics originally observed, #B the number of
permutations and
T* the test statistics observed for the permutated datasets.
However, function GSA implemented p=(#(T* > T)) / #B (as is also
defined in the belonging article)
see article here:
http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf
As a consequence, even for really insufficient small designs (say
comparison of two independent groups,
both of size 2) the resulting p-values contain a lot of cases with
p=0.
In my experience this is often the case for about half of the pathways
under consideration.
For larger designs this difference might not be that crucial, but for
really small designs,
I think that, this p-value calculation delivers far too overoptimistic
results
(too many "significant" pathways).
Is there a motivation for this unusual p-value calculation or should
the lines in the GSA function
(original:)
pvalues.hi[i] = sum(r.star[i, ] > r.obs[i])/nperms
pvalues.lo[i] = sum(r.star[i, ] < r.obs[i])/nperms
read instead:
pvalues.hi[i] = sum(r.star[i, ] >= r.obs[i])/nperms
pvalues.lo[i] = sum(r.star[i, ] <= r.obs[i])/nperms
Would be grateful for any comments or clarifications!!
sincerely
Dirk.
--
_____________________________________________________
Dr. Dirk Repsilber
Biomathematics / Bioinformatics group
Genetics and Biometry
Research Institute for the Biology of Farm Animals
FBN
Wilhelm-Stahl-Allee 2
D-18196 Dummerstorf
Tel: +49 38208 68 916
Fax: +49 38208 68 902
www.fbn-dummerstorf.de/de/Forschung/FBs/fb2/repsilber