Greetings!
I have a dataset comparing the effects of two different treatments on cell line line, including both control and combination treatment samples:
- Vehicle
- Treatment A
- Treatment B
- Treatment A + B
One of my goals is to determine which genes uniquely respond to each of the treatments, e.g. A & !B & !AB
.
A simple approach to this is to perform the individual contrasts and then exclude genes that were found to be significant in the "other" treatment groups/
For example, two find to genes that respond uniquely to treatment A, using a treatment coding, one could do:
dds <- DESeqDataSetFromFeatureCounts(..., design=~treatment)
dds <- DESeq(dds)
res_A <- lfcShrink(dds, contrast = c('treatment', 'A', 'Vehicle'))
res_B <- lfcShrink(dds, contrast = c('treatment', 'B', 'Vehicle'))
res_AB <- lfcShrink(dds, contrast = c('treatment', 'AB', 'Vehicle'))
ind <- (res_A$padj < 0.01) & (res_B$padj > 0.01) & (res_AB$padj > 0.01)
For the purposes of determine which genes respond uniquely to treatment A, this works fine.
However, suppose I then want to perform something like GSEA on the results to determine functions uniquely associated with treatment A. The P-values for the contrast res_A
above relate only to the question of which genes respond to treatment A, not which genes uniquely respond to treatment A.
Is there a better way to frame this contrast to determine P-values that correspond to the unique response of the samples to treatment A?
-- update 2019/11/18 --
Rather than attempting to answer the question using a single contrast, another approach I've considered is using the altHypothesis parameter for DESeq's results()
function to separately look for genes not differentially expressed under each of the treatment conditions, e.g.:
not_A <- results(dds, name = 'A_vs_Vehicle', lfcThreshold = 0.5, altHypothesis = "lessAbs")
not_B <- results(dds, name = 'B_vs_Vehicle', lfcThreshold = 0.5, altHypothesis = "lessAbs")
not_AB <- results(dds, name = 'AB_vs_Vehicle', lfcThreshold = 0.5, altHypothesis = "lessAbs")
The genes of interest (those affected uniquely by treatment A) could then be found by taking the intersection of significant genes in res_A
& not_B
& not_AB
.
The benefit here us that the later two contrasts also return P-values corresponding to the genes that do not appear to be differentially expressed under each of the alternative treatment conditions.
One could then do something like taking the mean of the three P-values or use Fisher's Method to combine them, in order to rank the genes by their ~ "combined evidence for unique expression under treatment A".
The derived P-values would not have any precise meaning, and an equal weighting like this might give to much priority to the (2/3) "not" contrasts. One could perhaps perform a weighted average and with 0.5 / 0.25 / 0.25 to emphasize the role of the first contrast, but this too seems to be moving even further any from any sort of rigorous treatment, but perhaps this would be sufficient to get to the original goal of ranking genes?
--
Apologies if this has already been answered elsewhere on this forum or others -- I did find quite a few discussions relating to some of the concepts in this thread (combining contrasts, etc.), however, I could not find any guidance for this particular problem.
Any advice would be greatly appreciated.
Thanks! Keith
Great! Thanks for the clarification and useful suggestions, and, more generally, for taking to time to regularly assist DESeq users - it is very much appreciated.
Thanks for the comment :)