I learnt from the this old post (limma roast syntax for overall anova) that ROAST cannot be performed on multiple contrasts (i.e., with the F-test). This is still true, I suppose? In that post it was suggested to use
geneSetTest with Fstatistic (from
First, my understanding was that
geneSetTest test different hypotheses, i.e., self-contained vs competitive. So this is not exactly apples and apples, is it?
Second, there is also
cameraPR now available in limma. What is the recommendation for using
cameraPR? In my analysis, I get very good significance (p<0.001) with
geneSetTest and no significance (p>0.5) with
cameraPR (once the inter gene correlation is included). I would like to know which to believe/interpret.
Third, more generally, in my analyses, I find situations where I apply either
camera on a single gene set, I get discordant conclusions. I understand they test different hypotheses. I do not want to indulge in p-value hacking and pick the test that fits my story. So do you have suggestions as to how to go about this in a consistent manner. (as an aside, I can suggest an explanation for no significance in
ROAST but significance in
CAMERA, when the effects are so small and limited to be insignificant overall due to multiple testing, but gene set of interest has all the genes on which there is some effect).
Thank you for your reply. If I may ask a couple of follow up questions.
1. Suppose I have identified a set of genes from analysis of one data set and want to test if these same genes are interestingly regulated in another dataset. Could I still use
camera()/geneSetTest()(to compare against background) or is there a more sound approach to deal with this? In other words, my gene set is not annotated gene set associated with a pathway but genes identified in another study.
2. Suppose these genes are precisely of interest because they are highly co-regulated in the the sense that they have similar time courses. Is this then still accounted for the
inter.gene.corr = 0.01in
cameraPR()? Should I estimate the correlation using
interGeneCorrelation()?I would expect these genes to be very correlated and would one ever get significance in this scenario using camera due to the variance inflation?
Thanks in advance.
For your first question: yes, that is fine, and is the basis of at least a few of the MSigDB collections. Just think about it; the annotated gene sets had to come from somewhere (i.e., previous data), they weren't given to us like manna from heaven. The key thing is that your two datasets should be generated independently. Obviously it would not make sense to define a gene set and then test it on the same dataset (or a dataset with some other dependencies, e.g., same patients).
For your second question; read the
?cameradocumentation. The default
inter.gene.corr=0.01gives a better ranking but does not strictly control the error rate. If you don't care about the rankings across gene sets, and you only want to know whether your particular gene set of interest contains high-ranking DE genes, then estimating the correlation with
interGeneCorrelation()seems to be the better approach as it provides correct type I error control.
Remember, what you really want to know is whether or not the genes in the set are (more) DE. The correlation between genes is just a nuisance parameter that needs to be overcome to answer your real question. However, if you do not model the correlation correctly, a low p-value could be purely driven by correlations in the absence of any DE. In most cases, detecting a gene set as containing correlated genes is not particularly interesting; this would be expected by definition.