I tried to post a similar question before but I think I didn't explain it good enough.
Given is the following design: two drugs (= treatments) with a similar phenotypic effect and a control group. The purpose is to compare the drugs: to obtain a list of genes responding to both drugs and to each one of them individually.
I'm doing Wald test for the contrasts treatment1_vs_control
and treatment2_vs_control
. Intuitively, two genes with the same p-value in one treatment but different p-values in the other treatment should be weighted differently. The idea is to leverage the fact that many of the genes will respond to both drugs simultaneously. What would be the right approach to combining the p-values, controlling the FDR and binning the genes into those responding to both drugs and responding to only one drug?
(The above is a simplification of the full design: ~ group
where group
is a combination of stage
and condition
. The treatment1_vs_control
and treatment2_vs_control
contrasts are requested per each one of the four stages. The groups sometimes have different numbers of observations).
EDIT
Some options I see:
- Apply the FDR cutoff separately for the two contrasts and just compare the resulting gene lists. This is precisely what I want to avoid as this underestimates the genes responding to both treatments
- Stouffer's test - https://doi.org/10.1371/journal.pone.0063290
- An ad hoc approach in which genes passing the FDR cutoff (applied independently for the two contrasts) in either one of the treatments are checked for having a raw p-value below a pre-defined threshold in the other treatment
EDIT2
I tend to think that a likelihood-ratio test is the right approach: I would first test for the effect of the full model: ~ treatment1 + treatment2
(where treatment1
and treatment2
are binary factors) against ~1
and for those genes which pass the FDR threshold, explore the contribution of the two treatment terms by comparing the full model to the reduced single-treatment models (~ treatment1
and ~ treatment2
). I'm not sure how and even whether to adjust the results of the second test.
Either you perform a boolean difference once the p-values have been thresholded.
setdiff( treatment1_vs_control, treatment2_vs_control)
returns genes found intreatment1_vs_control
but absent intreatment2_vs_control
.Either you compute the contrast
treatment1_vs_treatment2
, which is the option I will go for.My two cents...
Thanks!
When thresholding p-values for each contrasts, the p-values are already adjusted for multiple testing. Typically, adjustment is carried out per contrast. So, I would not re-adjust p-values IIUC.
So, a gene that shows a higher (or lower) response (aka logFC) in treatment1 than in treatment2 is not relevant to your question, which surprises me. Then, you should opt for the previous option, which prevents performing GSEA analyses, but still permits other enrichment analyses (don't forget to specify a relevant background from the experiment).
This is exactly my point: if the same gene has a low raw p-value for both treatments and given the knowledge that the two treatments exert a similar effect, the corresponding two discoveries should have a lower chance of being false. When a fixed FDR cutoff is applied separately, this information is lost. E.g. here it is suggested that Stouffer's test with a subsequent FDR control is applicable for this task: https://doi.org/10.1371/journal.pone.0063290 although they report that the resulting FDR is in fact more conservative than the target FDR.
I know nothing about this kind of approach.
At first, I thought you are looking for genes specific for one treatment. If you are looking for genes with the highest response in both treatments, did you try
(treatment1+treatment2)/2_vs_control
as a contrast?Hmm, I think your suggestion can be interpreted in multiple ways. Do you mean in LRT? Can you pls elaborate?
I see. The problem with this approach, as I understand, it is that you weight the two treatments equally such that a large effect in one of the treatments might "compensate" for the lack of the effect from the other treatment. For now, I'm digging more in the meta-analysis literature. The main obstacle of meta-analysis approaches if applied to experiments like mine is dependence: treatment1 and treatment2 share the same control.
What I often see is a plot on one axis one comparison and on the other the other by logFC or FC and then pick genes from that. You can apply a cutoff (if any p-value < threshold keep the gene). This allows to visualize and see if it makes sense to be more restrictive with the combined analysis. You might have lots of genes in the diagonal (those that repond to both drugs) but some other genes could be on the opposite diagonal (those that only respond to one drug). I wouldn't go through combining p.values into one, as you select them not in the basis of a single test but on the biological meaning and pathway they might be working in.
Thanks for the idea. Such a comparison is implemented in my pipeline as well - exactly as in your suggestion. I do see genes that I do not seem to have the statistical power to detect as DE but which align on the diagonal. I have two thoughts on this: 1) I do not know how to exactly formalize the inclusion criterion based on this correlation as it is clear that some of them are false positives while the genes which have high LFC are also easily detectable with the more formal test(s) and 2) some correlation is expected due the shared control group.