Search
Question: Do the outputs of Limma's competitive gene set methods (camera, romer) require a mutliple testing correction?
3
3.2 years ago by
Jon Manning40
United Kingdom
Jon Manning40 wrote:

I've been exploring Limma's excellent gene set methods (many thanks to the developers), and I think that romer() is probably what I want to use.

I'm only just getting to grips on all the methods and their differences. I suspect that romer() doesn't provide multiple-testing adjustment parameters because it doesn't make sense in a competitive context. Am I right, or should I be making the adjustment myself?

Many thanks for any pointers,

Jon

modified 3.2 years ago by Gordon Smyth35k • written 3.2 years ago by Jon Manning40
11
3.2 years ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

We recently did a simulation study with romer() and, unfortunately, both romer() and camera() do need multiple testing adjustments. camera() does this automatically but romer() doesn't. You can easily add an FDR column for romer() yourself.

romer() is designed to be an improved parametric analog of the Broad Institute's GSEA approach, but the approach is neither purely competitive nor self-contained, and personally I prefer the clarity of mroast() or camera().

My favourites are either mroast() with as many rotations as you have time for or camera() with a preset intergene correlation of 0.05 as in:

camera(y, index, design, contrast="whatever", inter.gene.cor=0.05)

Both of these will be more powerful than romer().

If your data has a large value for df.prior, then you can use fry() to approximate the mroast() results without having to run the rotations. That is a recent addition to the package.

I'm curious, is there a reason you set the inter-gene correlation manually rather than letting camera determine it from the data?

1

This is a new idea. We have long observed that gene sets that correspond closely to an important biological pathway tend to have high inter-gene correlations. We have also observed that geneSetTest() tends to rank gene sets well although it doesn't control the error rate correctly. camera() controls the error rate but is conservative for small sample sizes. The idea is to avoid penalizing the tightly co-regulated sets as much as camera would normally, to avoid rewarding a set for having discordant gene behaviour (negative inter gene correlation), and to gain in power. The preset correlation gives a compromise between geneSetTest() and camera() with estimated correlations. Not estimating the correlation results in a great increase in power.

1

Although this thread is a few months old, I was hoping to get some more information about this approach (using camera with a preset intergene correlation value). Gordon, can you clarify how one arrives at the 0.05 value for the inter.gene.cor setting? Is this value something that should be set based on a particular experiment or gene set collection? Or does 0.05 generally perform appropriately?

More specifically, would this approach be appropriate for a collection of gene sets that are co-regulated by definition (the sets were generated as lists of co-regulated genes determined from prior experimental data)? Not surprisingly, camera tends to strongly penalize these sets (which demonstrate inter-gene correlation values typically ranging from 0.1 to 0.5 in our data), reporting fairly high p values even for sets that appear to be differentially expressed in our experiments.

Thank you very much for any information you can offer.

I simply chose 0.05 by experimentation and intuition, and note that I am now suggesting 0.01 rather than 0.05. The resulting camera test will not control the type I error rate in the strict sense that camera() does, but the ranking of the sets should be good and the level of liberalness is perhaps acceptable.

My intention is to use the same 0.01 value for all the sensible sets regardless of level co-regulation. Choosing genes because they are correlated, rather than just co-differentially-expressed in a prior experiment, would be pushing it however.

Thanks for the prompt and helpful response.

To clarify using this approach for "pre-determined" co-regulated genes, I was referring to sets constructed similarly to those in MSigDB C4-CM: Cancer Modules. As I understand it, these sets are composed of genes (from existing sets) that were identified based on similar expression patterns in an integrated analysis of many publicly available datasets. As such, I would expect these sets to have high inter gene correlation values and therefore be strongly penalized by camera. However, knowing that these sets are composed of co-regulated genes, using a fixed inter.gene.cor might be problematic, as you mentioned. Am I thinking about this correctly?

Thank you again for your help.

2

The camera() method was designed, with or without preset correlation, with the MSigDB curated sets specifically in mind.

Keeping the correlation preset may seem counter intuitive, but not penalizing sets when they really are co-regulated sets is the whole purpose of the preset method. It is not actually true that all the MSigDB sets will be co-regulated in any specific study. Gene sets that correspond to pathways that are either not expressed or not changing in your samples will not show strong correlation. Observing a high inter-gene correlation for a set is a sign that the pathway is specifically active in your experiment and is varying between samples and hence that it is likely to be biologically relevant to your study. Hence it is specifically these sets that I want to avoid penalize.

The original camera method specifically penalizes those sets that are most likely to be biologically relevant, because the genes are co-regulated between samples. This is unavoidable if strict error rate control is to be achieved but it is biologically unfortunate. Hence the preset method is designed to gain biological relevance at some manageable cost to error rate control.

Thank you for the informative reply. I was confused by the inconsistency of baked-in multiple testing corrections across these methods and thought there might be a reason, but I'm grateful for the clarification.

1
3.2 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.9k wrote:

As far as I know, if you are testing multiple gene sets, you need to perform multiple testing correction as normal. I don't think the nature of the test (competitive or self-contained) has any bearing on this.