Question

DESeq2 2x2 factorial design: find genes unique to factor A?

0

Entering edit mode

Keith Hughitt ▴ 180

@keith-hughitt-6740

Last seen 5 months ago

United States

Greetings!

I have a dataset comparing the effects of two different treatments on cell line line, including both control and combination treatment samples:

Vehicle
Treatment A
Treatment B
Treatment A + B

One of my goals is to determine which genes uniquely respond to each of the treatments, e.g. A & !B & !AB.

A simple approach to this is to perform the individual contrasts and then exclude genes that were found to be significant in the "other" treatment groups/

For example, two find to genes that respond uniquely to treatment A, using a treatment coding, one could do:

dds <- DESeqDataSetFromFeatureCounts(..., design=~treatment)
dds <- DESeq(dds)

res_A  <- lfcShrink(dds, contrast = c('treatment', 'A', 'Vehicle'))
res_B  <- lfcShrink(dds, contrast = c('treatment', 'B', 'Vehicle'))
res_AB <- lfcShrink(dds, contrast = c('treatment', 'AB', 'Vehicle'))

ind <- (res_A$padj < 0.01) & (res_B$padj > 0.01) & (res_AB$padj > 0.01)

For the purposes of determine which genes respond uniquely to treatment A, this works fine.

However, suppose I then want to perform something like GSEA on the results to determine functions uniquely associated with treatment A. The P-values for the contrast res_A above relate only to the question of which genes respond to treatment A, not which genes uniquely respond to treatment A.

Is there a better way to frame this contrast to determine P-values that correspond to the unique response of the samples to treatment A?

-- update 2019/11/18 --

Rather than attempting to answer the question using a single contrast, another approach I've considered is using the altHypothesis parameter for DESeq's results() function to separately look for genes not differentially expressed under each of the treatment conditions, e.g.:

not_A <- results(dds, name = 'A_vs_Vehicle', lfcThreshold = 0.5, altHypothesis = "lessAbs")
not_B <- results(dds, name = 'B_vs_Vehicle', lfcThreshold = 0.5, altHypothesis = "lessAbs")
not_AB <- results(dds, name = 'AB_vs_Vehicle', lfcThreshold = 0.5, altHypothesis = "lessAbs")

The genes of interest (those affected uniquely by treatment A) could then be found by taking the intersection of significant genes in res_A & not_B & not_AB.

The benefit here us that the later two contrasts also return P-values corresponding to the genes that do not appear to be differentially expressed under each of the alternative treatment conditions.

One could then do something like taking the mean of the three P-values or use Fisher's Method to combine them, in order to rank the genes by their ~ "combined evidence for unique expression under treatment A".

The derived P-values would not have any precise meaning, and an equal weighting like this might give to much priority to the (2/3) "not" contrasts. One could perhaps perform a weighted average and with 0.5 / 0.25 / 0.25 to emphasize the role of the first contrast, but this too seems to be moving even further any from any sort of rigorous treatment, but perhaps this would be sufficient to get to the original goal of ranking genes?

--

Apologies if this has already been answered elsewhere on this forum or others -- I did find quite a few discussions relating to some of the concepts in this thread (combining contrasts, etc.), however, I could not find any guidance for this particular problem.

Any advice would be greatly appreciated.

Thanks! Keith

deseq2 deseq differential expression contrasts • 1.8k views

ADD COMMENT • link updated 5.6 years ago by Michael Love 43k • written 5.6 years ago by Keith Hughitt ▴ 180

score 2 · Accepted Answer · 2019-11-19

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 14 days ago

United States

There’s not a single contrast approach to define that set of requirements, and yes you can flip the alternate hypothesis to define not DE. The final pvalue as constructed with Fishers method will tend to reject if any of the individual hypotheses are not true. You could also consider taking the max pvalue to require that all must reject. That max is not uniform under the null but depending on application you may not need it to be. You can also transform it to uniform if needed.

ADD COMMENT • link 5.6 years ago Michael Love 43k

0

Entering edit mode

Great! Thanks for the clarification and useful suggestions, and, more generally, for taking to time to regularly assist DESeq users - it is very much appreciated.