Question: DESeq2: multiple factors and comparison of a main factor between samples
gravatar for mat.lesche
3.6 years ago by
mat.lesche50 wrote:


I'm working with a data set which consists of 3 patients. I have from each patients 9 samples and 3 samples were treated with treatment A, 3 samples with treatment B and 3 samples are the control.

I'm interested in the difference of Control vs. Treatment A and Treatment A vs Treatment B per patient and for all patients.

I used the following design for DESeq2:

coldata$combine <- factor(paste0(coldata$patient, "-", coldata$treatment))
dds <- DESeqDataSetFromMatrix(countData = Jcounts, colData = coldata, design = ~ combine)

I like the combination of the factors because it is easier use, you get your results with the results function and using list and recommended by Michael Love in a previous experiment

This design should be comparable with

dds <- DESeqDataSetFromMatrix(countData = Jcounts, colData = coldata, design = ~ patient + treatment + patient:treatment)

for my questions.

The PCA shows clear differences between the treatments with Control on left side, Treatment A in the middle and B on the right side of the x-axis (PC1) for all patients. Furthermore Patient 1 and 2 cluster together within different treatments. However, patient 3 is away from the other two patients on the y-axis (PC2) with 25 % variance.

With this picture in mind, can I use the following for the treatment comparison over all patients?

result_all <- results(dds, contrast = list(c("combineP1.A", "combineP2.A", "combineP3.A"), c("combineP1.Control", "combineP2.Control", "combineP3.Control")),  listValues = c(1/3, -1/3))

I get about 6,000 genes with a padj of 0.1.

Or should I just do a design only with treatment

dds <- DESeqDataSetFromMatrix(countData = Jcounts, colData = coldata, design = ~ treatment)

and compare A vs Control. I get about 3,000 genes with a padj of 0.1 and ca. 98 % of the are in the 6,000 from above. This design doesn't use the patient difference. So I guess it is not the best solution.




ADD COMMENTlink modified 3.6 years ago by Michael Love20k • written 3.6 years ago by mat.lesche50
gravatar for Michael Love
3.6 years ago by
Michael Love20k
United States
Michael Love20k wrote:

hi Mathias,

The result_all contrast is one valid way to get the "A vs Control" effect overall and it takes into account the differences in patient, so this is why the number of DE genes is higher. This is an average (in the log2 space) of the A vs Control effect in each patient.

Another option at looking for an overall effect, would be to test for consistent fold changes across patient by rerunning the model with a design of ~ patient + treatment, and then testing contrast=c("treatment","A","Control"). This will likely also give less number of DE genes than result_all, because it is not testing the average log2 fold change, but looking for a consistent fold change due to treatment, in other words, this model does not account for patient-specific treatment effects. 

ADD COMMENTlink written 3.6 years ago by Michael Love20k

Hey Michael,

thx for the feedback. What is the outcome if I only use treatment as design? Like I said, it gives about 3,000 genes whereas patient+treatment has about 5,700 genes and 90% of the genes of the 'treatment only design are in 'patient+treatment'

The design with treatment only ignores the patient sample relation and deseq is able to do an outlier fitting because each treatment has more than 7 samples.

Thx and happy Eastern.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by mat.lesche50

As you said, treatment only doesn't take into account patient effect. This means that patient differences, even if they are consistent in both treatment groups, look like extra noise (dispersion) to the statistical model. So this extra noise results in more conservative test statistics. 

If you are concerned with the number of DE genes, we recommend raising the threshold on the null hypothesis (see our paper and the section on thresholds, and the 'lfcThreshold' argument in ?results). The threshold you choose is up to you and depends on what is biological meaningful fold change. But this is better than reducing the results by not accounting for patient effects. It's not just the number of genes that change but the ranking, and you want to rank higher those genes with consistent fold changes due to treatment, after accounting for patient differences.

ADD REPLYlink written 3.6 years ago by Michael Love20k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 245 users visited in the last hour