I've been using ROAST and Camera, but I am bit unsure of how much to trust each. Let's say that I want to test if a just a a couple of gene sets are enriched in an RNAseq dataset. roast/mroast should do the trick, right? Well, I am under the impression that it's very easy for roast to produce interesting p-values, even if it's just random data.
As an example, let's randomly generate gene expression for 10000 genes, and make 1000 of these DE expressed between 2 groups. Then generate two lists with 100 genes: one where 50% of the genes are DE, and the other where genes are picked entirely at random.
set.seed(0) # generate random expression 10000 genes y <- matrix(rnorm(10000 * 4), 10000,4) design <- cbind(Intercept = 1, Group = c(0, 0, 1, 1)) # make 1000 genes differentially expressed y[1:1000, 3:4] <- y[1:1000, 3:4] + 3 # index 1 - 50% genes are DE DE_50 <- 951:1050 # index 2 - random genes DE_random <- sample(1:10000, 100) mroast(y, list(DE_50 = DE_50, DE_random = DE_random), design = design, contrast = 2, set.statistic = 'mean50')
NGenes PropDown PropUp Direction PValue FDR PValue.Mixed FDR.Mixed DE_50 100 0.02 0.55 Up 0.002 0.003 0.002 0.003 DE_random 100 0.11 0.17 Up 0.030 0.030 0.009 0.009
Quite interesting p-values for a random list! ROAST appears to have a huge false positive rate!
In contrast, Camera behaves as expected:
> camera(y, DE_50, design = design, contrast = 2) NGenes Direction PValue set1 100 Up 5.534736e-14 > camera(y, DE_random, design = design, contrast = 2) NGenes Direction PValue set1 100 Down 0.3403619
This makes me quite unsure about trusting ROAST. Camera is/was said to be quite conservative, but partly this could be just because ROAST gives many false positives. Are there any real advantages of using ROAST over Camera?