Limma camera function never gives any differentially expressed gene sets
3
5
Entering edit mode
chris86 ▴ 420
@chris86-8408
Last seen 4.4 years ago
UCL, United Kingdom

Hi

Maybe I am missing something, but when I run camera with any annotation I can find I never get any differently expressed gene sets, when I get so so many more with mroast and romer.  For example this is running on my differentially expressed human microarray genes with GO gene sets.

results <- camera(rows.to.keep, c2.indices, design,contrast=2)
results2 <- mroast(rows.to.keep, c2.indices,design,contrast=2,nrot=999)
results3 <- romer(rows.to.keep, c2.indices,design,contrast=2,nrot=999)

significant_up <- subset(results3, Up <= 0.05)
significant_down <- subset(results3, Down <= 0.05)
significant <- rbind(significant_up, significant_down)

significant2 <- subset(results2, FDR <= 0.05)
significant3 <- subset(results, FDR <= 0.05)

significant camera = 0

significant romer = 111

significant mroast = 502

So why is this? I guess cameras statistical method must be a lot more conservative, but it seems conservative to the extent where it is useless for all practical purposes. This seems a bit of a shame to me. Am I just using it in the wrong situation or something? I have not read the papers.

Thanks,

Chris

limma • 4.1k views
ADD COMMENT
2
Entering edit mode

You should definitely read the papers, because camera and roast are answering different questions, so you shouldn't expect to be able to compare their results.

ADD REPLY
1
Entering edit mode

Although I cannot give an answer to your question, you mention at the end that you did not read the papers. Maybe you should. Regardless whether there is a problem with the implementation or not, it is a good idea to be familiar with the methods used to analyze the data and on which we plan to derive conclusions.

ADD REPLY
7
Entering edit mode
@gordon-smyth
Last seen 1 minute ago
WEHI, Melbourne, Australia

Just a few things to be aware of in addition to Steve's excellent answer.

1. romer is more conservative than you think, because you have selected romer sets by one-sided p-value < 0.05 instead of FDR < 0.05.  Had you used FDR, you would have probably detected no sets using romer either.

2. The reason that camera() is conservative is that it has to estimate the inter gene correlation, and has to allow for the uncertainty of estimation of the correlation. This can make it very conservative when the number of replicate samples is small.

3. I have made a recent change to camera() in limma 3.24.14. You can now run it with a preset correlation by

results <- camera(rows.to.keep, c2.indices, design,contrast=2, inter.gene.cor=0.01, use.ranks=TRUE)

You will find that this is far more powerful, because the requirement to estimate the correlation is removed. Be sure to use a small positive value for inter.gene.cor -- I suggest a small value like 0.01.

4. There is no need to require FDR < 0.05. Gene set methods like camera, romer, GSEA are all reasonably conservative so it is common to use larger FDR cutoffs.

ADD COMMENT
0
Entering edit mode

Thanks for this

ADD REPLY
0
Entering edit mode

Thank you for the addition of the inter.gene.cor parameter. Would you consider allowing this value to also be a vector as long as the passed in index list so that we could, in theory, provide our own geneset inter-gene-correlations? I'd be happy to send a patch that enables this.

ADD REPLY
2
Entering edit mode

Well, my first reaction is that this would likely conflict with the reasoning and motivation for introducing the preset correlation. My purpose in adding this feature would potentially (very likely) be defeated by estimating different correlations for different sets.

I haven't laid out the reasons yet for the preset correlation. It has to do with a compromise between biological and statistical significance. Gene sets with high correlations are more likely to be biologically meaningful, but are also more likely to give statistical false positives when they are not actually correlated with the treatment conditions. In a way, what we want is to prioritize sets that are both positively correlated internally and are correlated with the treatment. Hence I am trying to reduce the statistical penalty that we apply to positively correlated sets at the cost of some small increase in FDR. I don't want strongly positively correlated sets to pay the full penalty for the correlation, so I don't want users to input the full positive correlation, even if they can estimate it (which I feel in most cases they probably can't).

My feeling is that this will work best if all sets pay the same correlation penalty. The small positive correlation also has the advantage of eliminating the possibility of unusually large gene sets appearing significant with minuscule changes.

ADD REPLY
5
Entering edit mode
@steve-lianoglou-2771
Last seen 14 months ago
United States

So why is this? I guess cameras statistical method must be a lot more conservative

That gels with my experience with using camera as well. The pvalues are often "much more conservative" than roast, in the sense that usually there are far fewer significant results (if any) from camera than with roast, but these two methods are also testing very different things.

camera is a "competitive test" while roast is a "self-contained" test.

romer is also a competitive test. I haven't used this much, however, since voom can't handle it, but I'm making a note to do some comparisons w/ analyses I've already done by using the analogous gene set testing methods from edgeR (I'm particularly interested in pitting romer vs camera in edgeR, but also edgeR roast vs. limma/voom roast, etc).

but it [camera] seems conservative to the extent where it is useless for all practical purposes

That's a bit of a strong statement to say from the results you get from one analysis/comparison, don't you think? :-) camera has certainly generated usable results for me in the past.

I admit there is some sense of "defeat" when camera returns 0 results and roast returns many, but you just have to understand the differences between the two. I tend to think the problem is more often with the quality of the gene sets we use -- specifically relating to the ability of the geneset to reflect the biology you think it should be reflecting in the experiment you are analyzing.

Given that last point, I tend to look at gene set enrichment exercises more for their ranking than strictly for their statistical significance measure.

Am I just using it in the wrong situation or something? I have not read the papers.

I'm not sure what you mean when you ask about "being in a wrong situation", but as others have already pointed out, you should read the papers.

Update

For the curious, here's an initial comparison of the nominal pvalues that come out of an analogous roast/camera analysis between voom and edgeR. From this n=1 example, they look quite concordant, but edgeR gives slightly more optimistic p-values.

voom vs edgeR GEA analysis 

ADD COMMENT

Login before adding your answer.

Traffic: 633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6