Question

Do the outputs of Limma's competitive gene set methods (camera, romer) require a mutliple testing correction?

3

Entering edit mode

Jon Manning ▴ 40

@jon-manning-3420

Last seen 10.0 years ago

United Kingdom

I've been exploring Limma's excellent gene set methods (many thanks to the developers), and I think that romer() is probably what I want to use.

I'm only just getting to grips on all the methods and their differences. I suspect that romer() doesn't provide multiple-testing adjustment parameters because it doesn't make sense in a competitive context. Am I right, or should I be making the adjustment myself?

Many thanks for any pointers,

Jon

limma romer • 5.4k views

ADD COMMENT • link updated 10.4 years ago by Gordon Smyth 53k • written 10.4 years ago by Jon Manning ▴ 40

score 11 · Answer 1 · 2015-08-26

11

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 5 hours ago

WEHI, Melbourne, Australia

We recently did a simulation study with romer() and, unfortunately, both romer() and camera() do need multiple testing adjustments. camera() does this automatically but romer() doesn't. You can easily add an FDR column for romer() yourself.

romer() is designed to be an improved parametric analog of the Broad Institute's GSEA approach, but the approach is neither purely competitive nor self-contained, and personally I prefer the clarity of mroast() or camera().

My favourites are either mroast() with as many rotations as you have time for or camera() with a preset intergene correlation of 0.05 as in:

camera(y, index, design, contrast="whatever", inter.gene.cor=0.05)

Both of these will be more powerful than romer().

If your data has a large value for df.prior, then you can use fry() to approximate the mroast() results without having to run the rotations. That is a recent addition to the package.

ADD COMMENT • link 10.4 years ago Gordon Smyth 53k

0

Entering edit mode

I'm curious, is there a reason you set the inter-gene correlation manually rather than letting camera determine it from the data?

ADD REPLY • link 10.4 years ago Ryan C. Thompson ★ 7.9k

1

Entering edit mode

This is a new idea. We have long observed that gene sets that correspond closely to an important biological pathway tend to have high inter-gene correlations. We have also observed that geneSetTest() tends to rank gene sets well although it doesn't control the error rate correctly. camera() controls the error rate but is conservative for small sample sizes. The idea is to avoid penalizing the tightly co-regulated sets as much as camera would normally, to avoid rewarding a set for having discordant gene behaviour (negative inter gene correlation), and to gain in power. The preset correlation gives a compromise between geneSetTest() and camera() with estimated correlations. Not estimating the correlation results in a great increase in power.

ADD REPLY • link 10.4 years ago • updated 9.4 years ago Gordon Smyth 53k

1

Entering edit mode

Although this thread is a few months old, I was hoping to get some more information about this approach (using camera with a preset intergene correlation value). Gordon, can you clarify how one arrives at the 0.05 value for the inter.gene.cor setting? Is this value something that should be set based on a particular experiment or gene set collection? Or does 0.05 generally perform appropriately?

More specifically, would this approach be appropriate for a collection of gene sets that are co-regulated by definition (the sets were generated as lists of co-regulated genes determined from prior experimental data)? Not surprisingly, camera tends to strongly penalize these sets (which demonstrate inter-gene correlation values typically ranging from 0.1 to 0.5 in our data), reporting fairly high p values even for sets that appear to be differentially expressed in our experiments.

Thank you very much for any information you can offer.

ADD REPLY • link 10.1 years ago Brad Rosenberg ▴ 10

0

Entering edit mode

I simply chose 0.05 by experimentation and intuition, and note that I am now suggesting 0.01 rather than 0.05. The resulting camera test will not control the type I error rate in the strict sense that camera() does, but the ranking of the sets should be good and the level of liberalness is perhaps acceptable.

My intention is to use the same 0.01 value for all the sensible sets regardless of level co-regulation. Choosing genes because they are correlated, rather than just co-differentially-expressed in a prior experiment, would be pushing it however.

ADD REPLY • link 10.1 years ago Gordon Smyth 53k

0

Entering edit mode

Thanks for the prompt and helpful response.

To clarify using this approach for "pre-determined" co-regulated genes, I was referring to sets constructed similarly to those in MSigDB C4-CM: Cancer Modules. As I understand it, these sets are composed of genes (from existing sets) that were identified based on similar expression patterns in an integrated analysis of many publicly available datasets. As such, I would expect these sets to have high inter gene correlation values and therefore be strongly penalized by camera. However, knowing that these sets are composed of co-regulated genes, using a fixed inter.gene.cor might be problematic, as you mentioned. Am I thinking about this correctly?

Thank you again for your help.

ADD REPLY • link 10.1 years ago Brad Rosenberg ▴ 10

2

Entering edit mode

The camera() method was designed, with or without preset correlation, with the MSigDB curated sets specifically in mind.

Keeping the correlation preset may seem counter intuitive, but not penalizing sets when they really are co-regulated sets is the whole purpose of the preset method. It is not actually true that all the MSigDB sets will be co-regulated in any specific study. Gene sets that correspond to pathways that are either not expressed or not changing in your samples will not show strong correlation. Observing a high inter-gene correlation for a set is a sign that the pathway is specifically active in your experiment and is varying between samples and hence that it is likely to be biologically relevant to your study. Hence it is specifically these sets that I want to avoid penalize.

The original camera method specifically penalizes those sets that are most likely to be biologically relevant, because the genes are co-regulated between samples. This is unavoidable if strict error rate control is to be achieved but it is biologically unfortunate. Hence the preset method is designed to gain biological relevance at some manageable cost to error rate control.

ADD REPLY • link 10.1 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you for the informative reply. I was confused by the inconsistency of baked-in multiple testing corrections across these methods and thought there might be a reason, but I'm grateful for the clarification.

ADD REPLY • link 10.4 years ago Jon Manning ▴ 40

score 1 · Answer 2 · 2015-08-25

1

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 15 months ago

Icahn School of Medicine at Mount Sinai…

As far as I know, if you are testing multiple gene sets, you need to perform multiple testing correction as normal. I don't think the nature of the test (competitive or self-contained) has any bearing on this.