DESeq2: multiple factors and comparison of a main factor between samples
1
1
Entering edit mode
mat.lesche ▴ 80
@matlesche-6835
Last seen 2.5 years ago
Germany

Hello,

I'm working with a data set which consists of 3 patients. I have from each patients 9 samples and 3 samples were treated with treatment A, 3 samples with treatment B and 3 samples are the control.

I'm interested in the difference of Control vs. Treatment A and Treatment A vs Treatment B per patient and for all patients.

I used the following design for DESeq2:

coldata$combine <- factor(paste0(coldata$patient, "-", coldata\$treatment))
dds <- DESeqDataSetFromMatrix(countData = Jcounts, colData = coldata, design = ~ combine)

I like the combination of the factors because it is easier use, you get your results with the results function and using list and recommended by Michael Love in a previous experiment

This design should be comparable with

dds <- DESeqDataSetFromMatrix(countData = Jcounts, colData = coldata, design = ~ patient + treatment + patient:treatment)

for my questions.

The PCA shows clear differences between the treatments with Control on left side, Treatment A in the middle and B on the right side of the x-axis (PC1) for all patients. Furthermore Patient 1 and 2 cluster together within different treatments. However, patient 3 is away from the other two patients on the y-axis (PC2) with 25 % variance.

With this picture in mind, can I use the following for the treatment comparison over all patients?

result_all <- results(dds, contrast = list(c("combineP1.A", "combineP2.A", "combineP3.A"), c("combineP1.Control", "combineP2.Control", "combineP3.Control")),  listValues = c(1/3, -1/3))

Or should I just do a design only with treatment

dds <- DESeqDataSetFromMatrix(countData = Jcounts, colData = coldata, design = ~ treatment)

and compare A vs Control. I get about 3,000 genes with a padj of 0.1 and ca. 98 % of the are in the 6,000 from above. This design doesn't use the patient difference. So I guess it is not the best solution.

Thanks

Mathias

deseq2 multiple factor design • 8.8k views
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi Mathias,

The result_all contrast is one valid way to get the "A vs Control" effect overall and it takes into account the differences in patient, so this is why the number of DE genes is higher. This is an average (in the log2 space) of the A vs Control effect in each patient.

Another option at looking for an overall effect, would be to test for consistent fold changes across patient by rerunning the model with a design of ~ patient + treatment, and then testing contrast=c("treatment","A","Control"). This will likely also give less number of DE genes than result_all, because it is not testing the average log2 fold change, but looking for a consistent fold change due to treatment, in other words, this model does not account for patient-specific treatment effects.

0
Entering edit mode

Hey Michael,

thx for the feedback. What is the outcome if I only use treatment as design? Like I said, it gives about 3,000 genes whereas patient+treatment has about 5,700 genes and 90% of the genes of the 'treatment only design are in 'patient+treatment'

The design with treatment only ignores the patient sample relation and deseq is able to do an outlier fitting because each treatment has more than 7 samples.

Thx and happy Eastern.

0
Entering edit mode

As you said, treatment only doesn't take into account patient effect. This means that patient differences, even if they are consistent in both treatment groups, look like extra noise (dispersion) to the statistical model. So this extra noise results in more conservative test statistics.

If you are concerned with the number of DE genes, we recommend raising the threshold on the null hypothesis (see our paper and the section on thresholds, and the 'lfcThreshold' argument in ?results). The threshold you choose is up to you and depends on what is biological meaningful fold change. But this is better than reducing the results by not accounting for patient effects. It's not just the number of genes that change but the ranking, and you want to rank higher those genes with consistent fold changes due to treatment, after accounting for patient differences.