Camera and ROAST
2
3
Entering edit mode
Nuno Pires ▴ 30
@nuno-pires-14492
Last seen 3.4 years ago
ETH Zurich, Switzerland

Hi

I've been using ROAST and Camera, but I am bit unsure of how much to trust each. Let's say that I want to test if a just a a couple of gene sets are enriched in an RNAseq dataset. roast/mroast should do the trick, right? Well, I am under the impression that it's very easy for roast to produce interesting p-values, even if it's just random data.

As an example, let's randomly generate gene expression for 10000 genes, and make 1000 of these DE expressed between 2 groups. Then generate two lists with 100 genes: one where 50% of the genes are DE, and the other where genes are picked entirely at random.

set.seed(0)

# generate random expression 10000 genes
y <- matrix(rnorm(10000 * 4), 10000,4)
design <- cbind(Intercept = 1, Group = c(0, 0, 1, 1))

# make 1000 genes differentially expressed
y[1:1000, 3:4] <- y[1:1000, 3:4] + 3

# index 1 - 50% genes are DE
DE_50 <- 951:1050

# index 2 - random genes
DE_random <- sample(1:10000, 100)

mroast(y, list(DE_50 = DE_50, DE_random = DE_random),
design = design, contrast = 2, set.statistic = 'mean50')


This gives:

          NGenes PropDown PropUp Direction PValue   FDR PValue.Mixed FDR.Mixed
DE_50        100     0.02   0.55        Up  0.002 0.003        0.002     0.003
DE_random    100     0.11   0.17        Up  0.030 0.030        0.009     0.009


Quite interesting p-values for a random list! ROAST appears to have a huge false positive rate!

In contrast, Camera behaves as expected:

> camera(y, DE_50, design = design, contrast = 2)
NGenes Direction       PValue
set1    100        Up 5.534736e-14
> camera(y, DE_random, design = design, contrast = 2)
NGenes Direction    PValue
set1    100      Down 0.3403619


This makes me quite unsure about trusting ROAST. Camera is/was said to be quite conservative, but partly this could be just because ROAST gives many false positives. Are there any real advantages of using ROAST over Camera?

Thanks, Nuno

limma • 1.6k views
0
Entering edit mode

Why did you choose set.statistic="mean50"?

roast and camera would give the same p-value for the random set if you stuck to the default settings, so the whole phenomenon that's worrying you wouldn't exist.

0
Entering edit mode

I am often interested in testing for significance in both directions (a subset of genes is up- and another is down-regulated). The default setting does not allow that, but floormean, mean50 and msq do. And in the original ROAST paper you suggested that mean50 is a good compromise in many biological situations.

Partially related to this: what exactly is the meaning of PValue.Mixed? According to limma's help, it's a 'non-directional p-value'. So how does this work in the context of using set.statistic="mean"? Note that even in your counter-example below (using the default options), PValue.Mixed is very low for the random set.

0
Entering edit mode

That is what the mixed p-value is for: it tests whether the set contains DE genes in any direction. If the mixed p-value is low but the directional p-value is not, then that shows that the set contains DE genes in both directions.

I tried hard to describe PValue.Mixed clearly, both in the roast paper and in the roast documentation. People do find the whole concept to be difficult however. You didn't find the documentation to be clear enough?

I did recommend "mean50" back in 2010, but since then I found it was very confusing to (sometimes) get both up and down p-values significant for the same set, so I changed "mean" to be the default instead.

camera obviously cannot detect subsets of genes changing in different directions so, if you want that, you have to use roast.

It seems to me that you may be wanting the gene set test to have a combination of properties that are impossible for a single test to achieve. A test that is sensitive to subsets of genes changing in different directions has to be sensitive to relatively small DE proportions, and that will then lead to the behaviour that you complained about in your post.

7
Entering edit mode
Aaron Lun ★ 26k
@alun
Last seen 20 hours ago
The city by the bay

ROAST is doing its job here. In ROAST, the null hypothesis is that none of the genes in the set are DE. You will find that the intersection of DE_random and the 1:1000 true DE genes is non-empty, which means that the null hypothesis is not true. So ROAST is correct in returning a low p-value. You can argue about the sensitivies of the different set statistics, given that the documentation suggests that using "mean50" would only detect DE proportions above 25%, but there is nothing inherently wrong with ROAST itself.

If you read the documentation in ?roast, you will find that ROAST and CAMERA test different hypotheses. ROAST performs a self-contained gene set test, where it looks for any DE within the set of genes. CAMERA performs a competitive gene set test, where it compares the DE within the gene set to the DE outside of the gene set. Thus, it is not surprising that the results of these two procedures will differ.

Personally, I find the results of ROAST to be easier to interpret than the results of CAMERA. In many cases, I just want to know whether the genes in my gene set or pathway are systematically DE. For this, a self-contained gene set test is more suitable. In contrast, the output of a competitive gene set test depends not only on the DE of genes within the set, but also on DE in genes outside the set. This can lead to counterintuitive behaviour where the significance of one gene set depends on the significance (or lack thereof) of other sets.

Do you care about whether a gene set is "more DE" than other sets? If so, then you should use a competitive gene set test like CAMERA. If you just want to know whether the gene set is DE in one (or any) direction, use a self-contained gene set test like ROAST.

0
Entering edit mode

Thank you so much Aaron for yet again clearing my (and I am sure of many others) doubts. You explanation (in general) is beyond what I can describe in words. I have learnt so much from your posts in my last only a year in the field of Bioinformatics/Genomics. Thanks to Gordon as well who is always prompt and willing to help. His answers are very well explanatory too. Keep up the GREAT work guys!

1
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

Hi Nuno,

Aaron has explained why your simulation is not showing any false discoveries. But I do wonder what documentation you are reading. The help pages for roast and camera are only a few paragraphs long and they explain how the two functions test different hypotheses. Do you understand the difference between a self-contained gene set test and a competitive test? The help pages try to explain this in simple terms. If you understand this, then the results from your simulation shouldn't be at all surprising. The version of camera() that you are using is BTW not conservative in practical situations. Again, the documentation would tell you that.

Your simulation is actually a good illustration of the (intended!) differences between two functions. Your "random" set is DE in absolute terms, with 11 out of 100 genes being strongly DE all in the same direction. In many circumstances we would want roast() to detect this set as DE.

Your question fails to mention that, by default, roast does not detect the random set as DE. Instead it gives P=0.33 for the random set:

> mroast(y, GeneList, design)
NGenes PropDown PropUp Direction PValue   FDR PValue.Mixed FDR.Mixed
DE_50        100     0.05   0.53        Up  0.003 0.005        0.007     0.013
DE_random    100     0.09   0.13        Up  0.330 0.330        0.022     0.022


This is because we have tuned roast() by default so that it does not give small p-values when the proportion of DE genes in a set is small. So, rather than illustrating a "problem", your simulated example actually shows that roast() does exactly what you seem to want it to do. Indeed roast gives almost the same p-value for the random set that camera does.

But, rather than relying on default settings, you have specially asked roast to try to detect a smaller proportion of DE genes, by setting set.statistic="mean50". So it is difficult to understand why you should be at all surprised that roast then does exactly what you asked it to do, and does then succeed in detecting the 11% DE genes in the set.

I think the real message here is: don't ask roast() to be sensitive to very small proportions of DE genes unless that is what you actually want.

Personally, I like both roast() and camera() and use them both regularly. But to see a situation when camera() would give you unintuitive results, let's make a small change to your simulation. In this simulation, all the genes are DE:

> set.seed(0)
> y <- matrix(rnorm(10000 * 4), 10000,4)
> design <- cbind(Intercept = 1, Group = c(0,0,1,1))
> y[, 3:4] <- y[, 3:4] + 3
> GeneList <- list(DE_50 = 951:1050, DE_random = sample(1:10000, 100))
> mroast(y, GeneList, design)
NGenes PropDown PropUp Direction PValue   FDR PValue.Mixed FDR.Mixed
DE_random    100        0   0.91        Up  0.001 0.001        0.001     0.001
DE_50        100        0   0.96        Up  0.002 0.002        0.002     0.002
> camera(y, GeneList, design)
NGenes Direction PValue   FDR
DE_50        100        Up  0.116 0.132
DE_random    100      Down  0.132 0.132


Now, roast finds both sets to be significant but camera finds neither. Now tell me, how you are going to explain to your lab head or biological collaborator why camera finds neither pathway to be DE, even though it is obvious that every single gene in both pathways is strongly DE, with every gene DE in the same direction by the same amount? Note that camera's failure to detect DE here has nothing to do with conservativeness - camera is not conservative. It has everything to do with the meaning of a self-contained test vs a competitive gene set test.

Best

Gordon

0
Entering edit mode

Thanks for the prompt replies Aaron and Gordon. I've read the documentation and the papers on roast and camera, and I am aware that they use different approaches to test different hypotheses.

I was trying to illustrate a common practical scenario: testing if a given pathway (which includes dozens of genes) responds to a given treatment (which may affect the expression of hundreds genes). Both ROAST and Camera can in principle be used to address this: ROAST "methodology is for self-contained tests with specially targeted gene sets" (Wu et al 2010), and Camera has "good performance for both focused testing of individual gene sets of special interest, and for gene set enrichment analysis using databases of gene annotation categories or transcriptional signatures" (Wu Smyth 2012)

The simulated example I gave suggest that, under the above scenario, camera is a safer approach than roast: for a random gene set (consisting of dozens of genes), there is a high probability that some elements of the set overlap with differentially expressed genes - which will immediately falsify the null hypothesis of ROAST.

Or from a different angle: the p-values derived from ROAST are not independent of the gene set size: larger gene sets will lead to higher false positive rates. Please correct me if I am wrong.

Best, Nuno

1
Entering edit mode

What is your evidence that ROAST is producing an excess of false positives? Your example code only runs ROAST on two gene sets that both contain DE genes, so they are both true positives, and ROAST correctly identifies this. Where is the problem?

1
Entering edit mode

Firstly, if you are to discuss false positive rates, do so in the context of the null hypothesis being true for each test. As you say, the null hypothesis is false for ROAST in your simulation, so there cannot be any false positives.

Secondly, it seems that your real concern is that the null hypothesis for ROAST is not worth testing. I beg to differ. Forget about having a "random gene set" for a moment; in practice, any gene sets that you care to use are defined a priori based on biological knowledge, so they will never be random. (Assuming a random set of DE genes is even less reasonable, given strong correlations between co-regulated genes in transcriptional networks.)

Instead, we have non-random gene sets for metabolic pathways, biological processes, transcriptomic signatures, etc. It is entirely sensible to ask, "are the genes in my set of interest differentially expressed between conditions?". The answer to this question can be provided by self-contained tests such as those implemented in ROAST. If I'm interested in a particular pathway or process, I simply don't care about what's happening in the other gene sets.

As the number of DE genes increases, you will generally get lower p-values from ROAST. This is entirely expected and reasonable, because more genes in the set are DE. It is irrelevant that the same decrease in p-values might be happening for other gene sets. ROAST doesn't care about that, neither do I, if I'm only interested in the DE in a particular gene set.

1
Entering edit mode

I wish you would stop claiming that roast gives inflated false discovery rates. It just isn't true.

For larger gene sets, roast has greater statistical power but there is no increase in the FDR.

0
Entering edit mode

My whole point with this thread was to discuss a simple scenario: my model M suggests that gene set G responds to treatment T; I sequenced some samples with and without treatment T, now I want to test if gene set G is indeed differentially expressed between T+ and T-. That's it. So what I was calling a false positive is when a test tells me that G is significantly different upon T, when my model is actually wrong (G has nothing to do with T). I did not mean to imply that roast itself has a high false positive rate (relative to it's own null hypothesis), I'm sorry if that was not clear.

Now if I use ROAST and if my gene set G is relatively large, well, I will likely get a nice p-value supporting my model. Maybe this is obvious and self-evident considering how roast works, but it wasn't immediately apparent to me. And I can imagine this becoming a larger problem if one uses ROAST to compare several gene sets of unequal sizes (which I guess it's a common practice - mroast anyone?).

Maybe my discussion really means to compare self-contained vs competitive tests. I don't really care, I am not an expert in this field, and I don't know how other self-contained tests differ from ROAST. But I am a limma fan, and would like to use it's tools: and for the scenario that I wrote above, Camera will be my clear choice over ROAST. That is not to say that ROAST will not be useful in other scenarios.

Gordon, I don't understand your simulation: how can ALL the genes in a sample being differentially expressed? The library sizes of samples 3:4 is orders of magnitude higher than 1:2 - it's a normalisation issue.

1
Entering edit mode

If 10% of the genes in G are differentially expressed with treatment T, why would you say that G and T are unrelated? If that's truly your position, then ROAST is not answering the question that you are asking. If you ask why the sky is blue, don't expect an explanation for why the grass is green.

1
Entering edit mode

Nuno, I'm happy of course for you to use camera() if you like it. No problem. But I'd like you to make that decision based on correct information. You assume that you if you choose a large enough random gene set, then ROAST will always give you a significant result. That is just not correct. Your own simulation shows that it is not so (see my answer above) and it will be much less so for a real dataset with real normalization applied to it. I showed (my answer above) that roast and camera actually give identical results in your simulated scenario.

With set.statistic="mean" (which is the default), roast will only give a significant result when a substantial proportion (50% say, but certainly more than 20% or so) of the genes in the set are DE. You are right to say that choosing a very large gene will indeed make it more likely to pick up a few DE genes, but it will not make it likely to pick up a high proportion of DE genes, and it is even less likely to pick up a high proportion of DE genes all changing in the same direction just by chance. So you will not get a small directional p-value from roast for a random set. That is precisely why I made set.statistic="mean" the default instead of set.statistic="msq" or "mean50". This means that large gene sets are not a problem with roast -- roast will not give you significant results for biologically trivial sets.

My advice is the same as it has always been. I recommend roast if you are testing a small number of sets of prior interest, and I would use camera to test a battery of gene sets.

There's nothing unusual about the scenario you are considering. It's the same scenario we always have in mind for roast tests.

Regarding the simulation, you are right that my simulation is unrealistic, essentially impossible in practice. The same competitive phenomenon can however occur in practice with real datasets when the number of DE genes is very large. Your original simulation is also rather unlikely in that it would be unusual to have all 100 DE genes changing in the same direction for a real dataset.