Question: cameraPR vs geneSetTest and ROAST/CAMERA in general
6
gravatar for BharathAnanth
19 months ago by
BharathAnanth70 wrote:

  Hi!

  I learnt from the this old post (limma roast syntax for overall anova) that ROAST cannot be performed on multiple contrasts (i.e., with the F-test). This is still true, I suppose? In that post it was suggested to use geneSetTest with Fstatistic (from topTable).

First, my understanding was that roast and geneSetTest test different hypotheses, i.e., self-contained vs competitive. So this is not exactly apples and apples, is it?  

Second, there is also cameraPR now available in limma. What is the recommendation for using geneSetTest vs cameraPR? In my analysis, I get very good significance (p<0.001) with geneSetTest and no significance (p>0.5) with cameraPR (once the inter gene correlation is included). I would like to know which to believe/interpret. 

Third, more generally, in my analyses, I find situations where I apply either roast or camera on a single gene set, I get discordant conclusions. I understand they test different hypotheses. I do not want to indulge in p-value hacking and pick the test that fits my story. So do you have suggestions as to how to go about this in a consistent manner. (as an aside, I can suggest an explanation for no significance in ROAST but significance in CAMERA, when the effects are so small and limited to be insignificant overall due to multiple testing, but gene set of interest has all the genes on which there is some effect). 

Thanks

ADD COMMENTlink modified 19 months ago by Aaron Lun25k • written 19 months ago by BharathAnanth70
Answer: cameraPR vs geneSetTest and ROAST/CAMERA in general
8
gravatar for Aaron Lun
19 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

For your first question; that's right, they are not testing the same null hypothesis. But roast() does not support F-statistics yet, so if you want to do a gene set test with multiple contrasts, you'll have to use geneSetTest(). This may be sufficient to answer your broader scientific question regarding the function of DE genes.

For your second question; reading ?geneSetTest should pretty much tell you what you need to know. To paraphrase the documentation, geneSetTest() assumes that the genes are independent, which is generally inappropriate in the presence of co-regulated genes. If geneSetTest() disagrees with cameraPR(), I would be inclined to believe the latter as it is more robust to correlations between genes.

For your third question; if you understand that they test different hypotheses, you should be able to pick the test that addresses your scientific question. Perhaps an example might be illustrative: imagine a situation where 50% of genes are DE between your conditions of interest. These 50% of genes are spread evenly throughout all gene sets, meaning that a competitive gene set test would not give any significant hits. However, a self-contained test would return significant hits for all gene sets. The outcomes for both tests are correct in their own way, but the scientific conclusions obtained are quite different.

For example, say I was interested in the immune response gene set. The self-contained test would reject the null, which tells me that the immune response is affected by the differences between conditions. Fair enough, as 50% of genes are DE in this set; I might then start to think about the biological consequences of altered immune activity between conditions. However, the competitive test would accept the null, which tells me that the immune response is no more affected than other gene sets by the differences between conditions. This is a different but also useful result, as it tells me that the immune response is not the primary distinguishing feature between conditions. Thus, I might be inclined to prioritize other pathways for follow-up work to characterize the differences between conditions.

ADD COMMENTlink modified 19 months ago • written 19 months ago by Aaron Lun25k

Hi Aaron

Thank you for your reply. If I may ask a couple of follow up questions.

1. Suppose I have identified a set of genes from analysis of one data set and want to test if these same genes are interestingly regulated in another dataset. Could I still use camera()/geneSetTest() (to compare against background) or is there a more sound approach to deal with this? In other words, my gene set is not annotated gene set associated with a pathway but genes identified in another study. 

2. Suppose these genes are precisely of interest because they are highly co-regulated in the the sense that they have similar time courses. Is this then still accounted for the inter.gene.corr = 0.01 in cameraPR()? Should I estimate the correlation using interGeneCorrelation()? I would expect these genes to be very correlated and would one ever get significance in this scenario using camera due to the variance inflation?

Thanks in advance.

ADD REPLYlink written 19 months ago by BharathAnanth70

For your first question: yes, that is fine, and is the basis of at least a few of the MSigDB collections. Just think about it; the annotated gene sets had to come from somewhere (i.e., previous data), they weren't given to us like manna from heaven. The key thing is that your two datasets should be generated independently. Obviously it would not make sense to define a gene set and then test it on the same dataset (or a dataset with some other dependencies, e.g., same patients).

For your second question; read the ?camera documentation. The default inter.gene.corr=0.01 gives a better ranking but does not strictly control the error rate. If you don't care about the rankings across gene sets, and you only want to know whether your particular gene set of interest contains high-ranking DE genes, then estimating the correlation with interGeneCorrelation() seems to be the better approach as it provides correct type I error control.

Remember, what you really want to know is whether or not the genes in the set are (more) DE. The correlation between genes is just a nuisance parameter that needs to be overcome to answer your real question. However, if you do not model the correlation correctly, a low p-value could be purely driven by correlations in the absence of any DE. In most cases, detecting a gene set as containing correlated genes is not particularly interesting; this would be expected by definition.

ADD REPLYlink modified 19 months ago • written 19 months ago by Aaron Lun25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 193 users visited in the last hour