I'm getting dramatically different results w camera from limma 3.28 vs 3.22 .. the new version gives pvalues <1e-16 while the old one is barely significant on <identical inputs>.
The variance inflation factor is different (new one = 1.85, old one = 7.12).
I've uploaded the arguments as an .rds and saved camera.default from both versions as camera.new (3.28) and camera.old (3.22) functions in separate .R files. Here is self contained code to recreate the results.
Help! Thanks
$ wget http://mskilab.com/tmp/limma.bug/camera.args.rds http://mskilab.com/tmp/limma.bug/camera.old.R http://mskilab.com/tmp/limma.bug/camera.new.R
$ R
R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> library(limma)
> source('camera.new.R')
> source('camera.old.R')
> head(do.call(camera.new, readRDS('camera.args.rds')))
> head(do.call(camera.old, readRDS('camera.args.rds')))
Appreciate the quick reply. I'm shuffling group labels. I missed the note in the documentation - but also surprised it has such a dramatic effect on results (!).
Whats interesting is that top gene sets recurrently appear among top gene sets in the group label permutations .. not sure what to make of that.
I get similar behavior when setting inter.gene.cor to a few different 'low' values e.g. 0.05. Though I can beat down the p values w very high values of this parameter.
Thanks
That is predictable behaviour. Sets containing large numbers of genes that are truly co-regulated in your biological system will tend have positive internal correlations and hence will have a higher than average chance of being highly ranked regardless of group label permutation.
Also be aware that permuting labels is not quite the same as simulating data with no DE. Many of the permutations will be unbalanced and hence will retain genuine DE between the permuted groups.
PS. I've moved your comment here. Please use "Add reply" or "Add comment" to make a reply to an answer rather than posting your own "Answer".
Thanks - that makes sense. I'm still wondering about how to best set this inter.gene.corr parameter.
If most gene sets are uncorrelated with the phenotype, then these should yield uniformly distributed p values under a proper null model. In my data, I'm finding that the standard parameter setting (0.01) gives inflated qq plots even for random gene sets. However, higher inter.gene.corr values yield straighter qq plots, presumably providing better control of FDR. Would this be a reasonable justification to use a higher inter.gene.corr value?
Perhaps, but I suspect that choosing inter.gene.cor by permutation is probably conservative, for the reasons I already mentioned.
Have you read the discussion that I referred you to a few days ago? I mean in particular my response to Devon Ryan at:
http://f1000research.com/articles/5-1438