Question

major camera change from limma 3.22 to 3.28?

0

Entering edit mode

mski ▴ 10

@mski-11319

Last seen 7.7 years ago

I'm getting dramatically different results w camera from limma 3.28 vs 3.22 .. the new version gives pvalues <1e-16 while the old one is barely significant on <identical inputs>.

The variance inflation factor is different (new one = 1.85, old one = 7.12).

I've uploaded the arguments as an .rds and saved camera.default from both versions as camera.new (3.28) and camera.old (3.22) functions in separate .R files. Here is self contained code to recreate the results.

Help! Thanks

$ wget http://mskilab.com/tmp/limma.bug/camera.args.rds http://mskilab.com/tmp/limma.bug/camera.old.R http://mskilab.com/tmp/limma.bug/camera.new.R
$ R
R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> library(limma) 
> source('camera.new.R')
> source('camera.old.R')
> head(do.call(camera.new, readRDS('camera.args.rds')))
> head(do.call(camera.old, readRDS('camera.args.rds')))

limma camera • 1.7k views

ADD COMMENT • link updated 7.7 years ago by Gordon Smyth 50k • written 7.7 years ago by mski ▴ 10

0

Entering edit mode

Appreciate the quick reply. I'm shuffling group labels. I missed the note in the documentation - but also surprised it has such a dramatic effect on results (!).

Whats interesting is that top gene sets recurrently appear among top gene sets in the group label permutations .. not sure what to make of that.

I get similar behavior when setting inter.gene.cor to a few different 'low' values e.g. 0.05. Though I can beat down the p values w very high values of this parameter.

Thanks

ADD REPLY • link updated 7.7 years ago by Gordon Smyth 50k • written 7.7 years ago by mski ▴ 10

1

Entering edit mode

That is predictable behaviour. Sets containing large numbers of genes that are truly co-regulated in your biological system will tend have positive internal correlations and hence will have a higher than average chance of being highly ranked regardless of group label permutation.

Also be aware that permuting labels is not quite the same as simulating data with no DE. Many of the permutations will be unbalanced and hence will retain genuine DE between the permuted groups.

PS. I've moved your comment here. Please use "Add reply" or "Add comment" to make a reply to an answer rather than posting your own "Answer".

ADD REPLY • link 7.7 years ago Gordon Smyth 50k

0

Entering edit mode

Thanks - that makes sense. I'm still wondering about how to best set this inter.gene.corr parameter.

If most gene sets are uncorrelated with the phenotype, then these should yield uniformly distributed p values under a proper null model. In my data, I'm finding that the standard parameter setting (0.01) gives inflated qq plots even for random gene sets. However, higher inter.gene.corr values yield straighter qq plots, presumably providing better control of FDR. Would this be a reasonable justification to use a higher inter.gene.corr value?

ADD REPLY • link 7.7 years ago mski ▴ 10

0

Entering edit mode

Perhaps, but I suspect that choosing inter.gene.cor by permutation is probably conservative, for the reasons I already mentioned.

Have you read the discussion that I referred you to a few days ago? I mean in particular my response to Devon Ryan at:

http://f1000research.com/articles/5-1438

ADD REPLY • link 7.7 years ago Gordon Smyth 50k

score 1 · Answer 1 · 2016-08-17

1

Entering edit mode

mski ▴ 10

@mski-11319

Last seen 7.7 years ago

ok just answered my own Q .. the new version has the param inter.gene.cor = 0.01 while the previous one estimates it from the data. I can reproduce the camera.old results using using camera.new with inter.gene.cor = NA ...

Not sure if there was a news flash about this that I missed ... but probably good to know

I've seen in other forums that setting inter.gene.cor to a "low number" is recommended, however here I'm noticing that the new default gives inflated p values for random group assignments

ADD COMMENT • link 7.7 years ago mski ▴ 10

1

Entering edit mode

The documentation is there to help! Reading help("camera"), it says:

Note. The default settings for inter.gene.cor and allow.neg.cor were changed to the current values in limma 3.28.6. Previously, the default was to estimate an inter-gene correlation for each set. To reproduce the earlier default, use allow.neg.cor=TRUE and inter.gene.cor=NA.

Usually we try not to change default values in limma but, in this case, we want to encourage users to make the change. For more background, see the discussion to this article

http://f1000research.com/articles/5-1438

We are already aware that the new default will not control the type I error rate correctly, but this was unavoidable given we wanted to improve the gene set rankings.

ADD REPLY • link 7.7 years ago Gordon Smyth 50k

0

Entering edit mode

Could you elaborate a bit on what you mean on what "random group assignments" your doing?

If we're running a two group comparison, are you randomly assigning samples to group A and group B then running camera on that? Or is the randomization done by assigning random genes to a geneset and testing that geneset collection with camera (but with correct group A and group B sample assignments).

ADD REPLY • link 7.7 years ago Steve Lianoglou ★ 13k