Dear all,
I am trying to learn the GAGE gene-set enrichment package and apply it to the RNA-Seq data I have. I follow the RNA-Seq workflow: http://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/RNA-seqWorkflow.pdf, section 7.1, workflow with DESeq2.
What perplexes me is the big difference in results depending on the statistical test I choose. When I used the default (t-test), I got 5 significant pathways, with the top q-value of the order 10^-3. Although the original paper (Luo et al, 2009) claims, refering to Kim/Volsky 2005 paper, that for gene sets of 10 genes and more the assumption of normality is fine, I decided to double-check that with Kolmogorov-Smirnov and got 14 significant pathways with the top q-value of the order 10^-8.
The difference did not seem as minor to me as I would expect. I also tried rank.test=TRUE and got the result much closer to the default case (7 significant pathways, 10^-4 top q-value). This option supposedly takes care of possible not-normality of the distribution assumed in the t-test, but I am not sure whether the other t-test assumption, "fold changes of genes are independent and identically distributed", is taken care of.
As I said, the difference between the rank test and the default is not that big and, not having run K-S, I could possibly be satisfied. Now I frankly do not know what to think of the results. Could anybody, please, share any suggestions on how to approach the situation.
Thank you
Slava