Question

Two treatments, one control

0

Entering edit mode

alephreish • 0

@2fd8f786

Last seen 6 months ago

Israel

I tried to post a similar question before but I think I didn't explain it good enough.

Given is the following design: two drugs (= treatments) with a similar phenotypic effect and a control group. The purpose is to compare the drugs: to obtain a list of genes responding to both drugs and to each one of them individually.

I'm doing Wald test for the contrasts treatment1_vs_control and treatment2_vs_control. Intuitively, two genes with the same p-value in one treatment but different p-values in the other treatment should be weighted differently. The idea is to leverage the fact that many of the genes will respond to both drugs simultaneously. What would be the right approach to combining the p-values, controlling the FDR and binning the genes into those responding to both drugs and responding to only one drug?

(The above is a simplification of the full design: ~ group where group is a combination of stage and condition. The treatment1_vs_control and treatment2_vs_control contrasts are requested per each one of the four stages. The groups sometimes have different numbers of observations).

EDIT

Some options I see:

Apply the FDR cutoff separately for the two contrasts and just compare the resulting gene lists. This is precisely what I want to avoid as this underestimates the genes responding to both treatments
Stouffer's test - https://doi.org/10.1371/journal.pone.0063290
An ad hoc approach in which genes passing the FDR cutoff (applied independently for the two contrasts) in either one of the treatments are checked for having a raw p-value below a pre-defined threshold in the other treatment

EDIT2

I tend to think that a likelihood-ratio test is the right approach: I would first test for the effect of the full model: ~ treatment1 + treatment2 (where treatment1 and treatment2 are binary factors) against ~1 and for those genes which pass the FDR threshold, explore the contribution of the two treatment terms by comparing the full model to the reduced single-treatment models (~ treatment1 and ~ treatment2). I'm not sure how and even whether to adjust the results of the second test.

DESeq2 stageR • 1.5k views

ADD COMMENT • link 8 months ago alephreish • 0

0

Entering edit mode

Either you perform a boolean difference once the p-values have been thresholded. setdiff( treatment1_vs_control, treatment2_vs_control) returns genes found in treatment1_vs_control but absent in treatment2_vs_control.

Either you compute the contrast treatment1_vs_treatment2, which is the option I will go for.

My two cents...

ADD REPLY • link 8 months ago SamGG ▴ 360

0

Entering edit mode

Thanks!

How do you correct for multiple tests after thresholding the raw p-values?
I'm not interested in the treatment1_vs_treatment2 contrast per se. The same gene might respond to both treatments and might also show differences between them.

ADD REPLY • link 8 months ago alephreish • 0

0

Entering edit mode

How do you correct for multiple tests after thresholding the raw p-values?

When thresholding p-values for each contrasts, the p-values are already adjusted for multiple testing. Typically, adjustment is carried out per contrast. So, I would not re-adjust p-values IIUC.

I'm not interested in the treatment1_vs_treatment2 contrast per se. The same gene might respond to both treatments and might also show differences between them.

So, a gene that shows a higher (or lower) response (aka logFC) in treatment1 than in treatment2 is not relevant to your question, which surprises me. Then, you should opt for the previous option, which prevents performing GSEA analyses, but still permits other enrichment analyses (don't forget to specify a relevant background from the experiment).

ADD REPLY • link 8 months ago SamGG ▴ 360

0

Entering edit mode

When thresholding p-values for each contrasts, the p-values are already adjusted for multiple testing. Typically, adjustment is carried out per contrast. So, I would not re-adjust p-values IIUC.

This is exactly my point: if the same gene has a low raw p-value for both treatments and given the knowledge that the two treatments exert a similar effect, the corresponding two discoveries should have a lower chance of being false. When a fixed FDR cutoff is applied separately, this information is lost. E.g. here it is suggested that Stouffer's test with a subsequent FDR control is applicable for this task: https://doi.org/10.1371/journal.pone.0063290 although they report that the resulting FDR is in fact more conservative than the target FDR.

ADD REPLY • link 8 months ago alephreish • 0

0

Entering edit mode

I know nothing about this kind of approach.

At first, I thought you are looking for genes specific for one treatment. If you are looking for genes with the highest response in both treatments, did you try (treatment1+treatment2)/2_vs_control as a contrast?

ADD REPLY • link 8 months ago SamGG ▴ 360

0

Entering edit mode

Hmm, I think your suggestion can be interpreted in multiple ways. Do you mean in LRT? Can you pls elaborate?

ADD REPLY • link 8 months ago alephreish • 0

0

Entering edit mode

I see. The problem with this approach, as I understand, it is that you weight the two treatments equally such that a large effect in one of the treatments might "compensate" for the lack of the effect from the other treatment. For now, I'm digging more in the meta-analysis literature. The main obstacle of meta-analysis approaches if applied to experiments like mine is dependence: treatment1 and treatment2 share the same control.

ADD REPLY • link 8 months ago alephreish • 0

0

Entering edit mode

What I often see is a plot on one axis one comparison and on the other the other by logFC or FC and then pick genes from that. You can apply a cutoff (if any p-value < threshold keep the gene). This allows to visualize and see if it makes sense to be more restrictive with the combined analysis. You might have lots of genes in the diagonal (those that repond to both drugs) but some other genes could be on the opposite diagonal (those that only respond to one drug). I wouldn't go through combining p.values into one, as you select them not in the basis of a single test but on the biological meaning and pathway they might be working in.

ADD REPLY • link 8 months ago Lluís Revilla Sancho ▴ 760

0

Entering edit mode

Thanks for the idea. Such a comparison is implemented in my pipeline as well - exactly as in your suggestion. I do see genes that I do not seem to have the statistical power to detect as DE but which align on the diagonal. I have two thoughts on this: 1) I do not know how to exactly formalize the inclusion criterion based on this correlation as it is clear that some of them are false positives while the genes which have high LFC are also easily detectable with the more formal test(s) and 2) some correlation is expected due the shared control group.

ADD REPLY • link 8 months ago alephreish • 0

score 0 · Answer 1 · 2024-07-21

Below is my answer which is basically an extended version of the idea in my EDIT2 (thanks to Michael Love for drawing my attention to the stageR package):

While all of the listed solutions in my first EDIT are valid at least on some level, the most natural way of approaching the problem of revealing the genes responding to both treatments with a full control over the FDR (in this case: the overall FDR or OFDR) is the stage-wise method AKA the two-stage testing procedure which was introduced by Heller et al (2009) "A flexible two-stage procedure for identifying gene sets that are differentially expressed" and extended in the stageR package, Van den Berge et al (2017) "stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage".

Briefly, when applied to the task at hand the procedure works as follows:

The screening stage consists of of testing the general effect of the condition for each gene. I do this with the likelihood-ratio test as implemented in DESeq2: the full model ~ treatment1 + treatment2 is tested against the intercept-only reduced model ~ 1. Genes passing the screening stage with an FDR threshold are taken to the confirmation stage. Here I use LRT to test for the contribution of the individual terms (treatment1 and treatment2) to the full model. The p-values are FWER-adjusted per gene. The design at hand allows to apply the modified sequentially rejective Bonferroni (MSRB) procedure, see Shaffer (1986) "Modified sequentially rejective multiple test procedures.": stageWiseAdjustment(method="none") in stageR. For details see the stageR vignette and the paper for details.

On top of the OFDR control, I add an additional filter on the minimal log-fold changes.