Individual DESeq2 runs versus a single DESeq2 run with contrasts produce different results
Entering edit mode
jovel_juan ▴ 20
Last seen 19 days ago

I have an experiment with the following groups: one control (unstimulated T cells), and three experimental groups (T cells stimulated with TCR, with INFL3 or both).

I have compared the control against TCR stimulation alone or INFL3 alone, or TCR against TCR+INFL3 or INFL3 against TCR+INFL3. It implied creating 4 different counts and metadata files and running DESeq2 4 times.

Alternatively, I run all samples at once (in a single DESeq2 run) and extracted the desired contrasts with the code below:

groups <- unique(metadata$group)
possible_contrasts <- combn(groups, 2)
contrasts <- lapply(1:ncol(possible_contrasts), function(x){
  c('group', possible_contrasts[2,x], possible_contrasts[1,x])

selected_contrasts <- c(contrasts[1], contrasts[2], contrasts[5], contrasts[6])

for (i in 1:length(selected_contrasts)){
  contrast_args <- selected_contrasts[[i]]
  contrast_name <- paste(contrast_args, collapse='_vs_')
  contrast_name <- paste(contrast_args[1], paste(contrast_args[-1], collapse = "_vs_"), sep = "_")
  file_name  <- paste(contrast_name, '_q0.05.tsv', sep='')
  my_results <- results(ddsMat, contrast = contrast_args)
  resSig     <- subset(my_results, padj < 0.05)
  write.table(resSig, file_name, sep = '\t', quote = F, row.names = F)

When DE results are filtered using a corrected pValue (padj) threshold of 0.05, both approaches produce considerably different number or DE features.

Here the numbers:

With four individual DESeq2 runs (res <- results(dds):

11357 group_02_TCR_vs_01_control_q0.05.tsv

299 group_03_INFL3_vs_01_control_q0.05.tsv

304 group_04_TCR-INFL3_vs_02_TCR_q0.05.tsv

11127 group_04_TCR-INFL3_vs_03_INFL3_q0.05.tsv

With a single DESeq2 run extract contrasts with the code above:

9738 group_02_TCR_vs_01_control_q0.05.tsv

720 group_03_INFL3_vs_01_control_q0.05.tsv

528 group_04_TCR-INFL3_vs_02_TCR_q0.05.tsv

11612 group_04_TCR-INFL3_vs_03_INFL3_q0.05.tsv

I have not done any thorough comparison besides counting the number of DE features with each method. However, the difference in numbers of DE features worries me enough to believe that the normalization parameters derived from the whole dataset (of from individual subsets when individual DESeq2 runs) leads to different results.

Intuitively, I tend to believe that individual DESeq2 runs for individual subsets is more accurate because normalization is done on exactly the relevant samples.

Any input would be most welcome.

contrasts DESeq2 • 302 views
Entering edit mode
ATpoint ★ 3.4k
Last seen 1 day ago

Please see the vignette, spitting samples or keeping together is covered there. This was asked many times before, use google or the search function to find them.

Entering edit mode

Thank you! Apologies for the redundancy!


Login before adding your answer.

Traffic: 237 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6