Question

Individual DESeq2 runs versus a single DESeq2 run with contrasts produce different results

0

Entering edit mode

jovel_juan ▴ 30

@jovel_juan-7129

Last seen 4 months ago

Canada

I have an experiment with the following groups: one control (unstimulated T cells), and three experimental groups (T cells stimulated with TCR, with INFL3 or both).

I have compared the control against TCR stimulation alone or INFL3 alone, or TCR against TCR+INFL3 or INFL3 against TCR+INFL3. It implied creating 4 different counts and metadata files and running DESeq2 4 times.

Alternatively, I run all samples at once (in a single DESeq2 run) and extracted the desired contrasts with the code below:

groups <- unique(metadata$group)
possible_contrasts <- combn(groups, 2)
contrasts <- lapply(1:ncol(possible_contrasts), function(x){
  c('group', possible_contrasts[2,x], possible_contrasts[1,x])
})

selected_contrasts <- c(contrasts[1], contrasts[2], contrasts[5], contrasts[6])

for (i in 1:length(selected_contrasts)){
  contrast_args <- selected_contrasts[[i]]
  contrast_name <- paste(contrast_args, collapse='_vs_')
  contrast_name <- paste(contrast_args[1], paste(contrast_args[-1], collapse = "_vs_"), sep = "_")
  print(contrast_name)
  file_name  <- paste(contrast_name, '_q0.05.tsv', sep='')
  my_results <- results(ddsMat, contrast = contrast_args)
  resSig     <- subset(my_results, padj < 0.05)
  write.table(resSig, file_name, sep = '\t', quote = F, row.names = F)
}

When DE results are filtered using a corrected pValue (padj) threshold of 0.05, both approaches produce considerably different number or DE features.

Here the numbers:

With four individual DESeq2 runs (res <- results(dds):

11357 group_02_TCR_vs_01_control_q0.05.tsv

299 group_03_INFL3_vs_01_control_q0.05.tsv

304 group_04_TCR-INFL3_vs_02_TCR_q0.05.tsv

11127 group_04_TCR-INFL3_vs_03_INFL3_q0.05.tsv

With a single DESeq2 run extract contrasts with the code above:

9738 group_02_TCR_vs_01_control_q0.05.tsv

720 group_03_INFL3_vs_01_control_q0.05.tsv

528 group_04_TCR-INFL3_vs_02_TCR_q0.05.tsv

11612 group_04_TCR-INFL3_vs_03_INFL3_q0.05.tsv

I have not done any thorough comparison besides counting the number of DE features with each method. However, the difference in numbers of DE features worries me enough to believe that the normalization parameters derived from the whole dataset (of from individual subsets when individual DESeq2 runs) leads to different results.

Intuitively, I tend to believe that individual DESeq2 runs for individual subsets is more accurate because normalization is done on exactly the relevant samples.

Any input would be most welcome.

contrasts DESeq2 • 546 views

ADD COMMENT • link 10 months ago jovel_juan ▴ 30

score 1 · Answer 1 · 2023-06-10

1

Entering edit mode

ATpoint ★ 4.0k

@atpoint-13662

Last seen 1 hour ago

Germany

Please see the vignette, spitting samples or keeping together is covered there. This was asked many times before, use google or the search function to find them.