I am working on a more complex design, but I think a basic 2-factor (each with two levels) captures the spirit of my question. Namely, I am interested in creating "reasonable" mock data to appropriately test my understanding of extracting the appropriate contrasts in DESeq2's
In the script below, I create some mock data using DESeq2's
makeExampleDESeqDataSet method. I then perturb the several genes by quite a bit, "spiking" in the effects I hope to find upon running it through a differential expression analysis.
My hypothetical setup takes 2 factors (genetic background and treatment) at two levels each:
backgroundfactor has levels: sgCtrl, sgFoo
treatmentfactor has levels: DMSO, T1
There are 3 replicates in each combination of levels for a total of 12 total samples.
I model using the main + interaction:
~ background + treatment + background:treatment
resultsNames(dds) produces the following:
 "Intercept" "background_sgFoo_vs_sgCtrl"  "treatment_T1_vs_DMSO" "backgroundsgFoo.treatmentT1"
results(dds, name='background_sgFoo_vs_sgCtrl'), I recover the main effect on
background that I introduced, as expected. That is, the genes I perturbed are all near the top of the sorted list.
However, I cannot seem to recover any interaction term using
res <- results(dds, name="backgroundsgFoo.treatmentT1"). For reference, here are the raw counts for the gene where I introduced the interaction effect:
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 116 171 157 130 24 125 433 567 sample9 sample10 sample11 sample12 497 844 838 864
Even just visually, you can see the main
background effect on samples 7-12 and the additional effect on samles 10-12, which I thought should yield a very obvious interaction effect. Yet that particular gene gets raw p-value 0.017 and FDR of 0.81, despite me introducing a rather large perturbation. Is this just an artifact of my spiking procedure? Or am I missing something?
resultsNames above, I can identify the following coefficients:
- B0 = intercept (representing sgCtrl + DMSO here)
- B1 = main effect of background (e.g. comparing sgFoo+DMSO vs. sgCtrl+DMSO)
- B2 = main effect of treatment (e.g. comparing sgCtrl+T1 vs. sgCtrl+DMSO)
- B3 = interaction effect (of T1 with sgFoo)
Then, if I was interested in the effect of treatment T1 in sgFoo versus treatment T1 in sgCtrl, I would have:
(B0+B1+B2+B3) - (B0 + B2) = B1+B3 and would use
results(dds, contrast=c(0,1,0,1)), correct? I recognize this has the main
background effect B1, so the interpretation is a little more nuanced than comparing genetic backgrounds both under DMSO condition.
Thanks for any help/advice/etc.!
Script (DESeq2_1.26.0 but can provide full sessionInfo if necessary):
library(DESeq2) set.seed(0) mock_dds <- makeExampleDESeqDataSet(n=10000, m=12) # First extract the raw counts raw_counts = counts(mock_dds) print('Original raw counts:') print(head(raw_counts)) # setup annotations background = rep(c("sgCtrl","sgFoo"),each=6) treatment = rep(rep(c("DMSO","T1"),each=3),2) sample_annotations = cbind(background, treatment) rownames(sample_annotations)<-paste0('sample', 1:12) # By setup, there should be no true effects so far. # For gene1, gene3, and gene5, add an effect due to the genetic background. # To do this get the row mean for each of those genes and simply add that onto # the sgFoo samples (for both DMSO and T1 treated conditions) selected_genes = c('gene1', 'gene3', 'gene5') subset_means = floor(rowMeans(raw_counts[selected_genes,])) # create arrays for the samples we will eventually select: sgFoo_samples = paste0('sample', 7:12) sgFoo_plus_treated_samples = paste0('sample', 10:12) # Add the perturbation to the sgFoo samples (sample7-sample12). Here, we add a 3-fold increase: raw_counts[selected_genes, sgFoo_samples] = raw_counts[selected_genes, sgFoo_samples] + 3*subset_means # Add an additional perturbation for only gene3 on the sgFoo+treated samples (sample10-sample12) raw_counts['gene3', sgFoo_plus_treated_samples] = raw_counts['gene3', sgFoo_plus_treated_samples] + 3*subset_means['gene3'] print('Perturbed raw counts:') print(head(raw_counts)) # Run DESEq2: dds <- DESeqDataSetFromMatrix(countData = raw_counts, colData = sample_annotations, design = ~background+treatment+background:treatment) dds <- DESeq(dds) print(resultsNames(dds)) # To get the background effect imposed on sgFoo-- res <- results(dds, name='background_sgFoo_vs_sgCtrl') resOrdered <- res[order(res$pvalue),] print(head(resOrdered)) print('*****************') # Check the interaction effect to see if gene3 pops up: res <- results(dds, name="backgroundsgFoo.treatmentT1") resOrdered <- res[order(res$pvalue),] print(head(resOrdered)) print(resOrdered['gene3',])