Individual DESeq2 runs versus a single DESeq2 run with contrasts produce different results
1
0
Entering edit mode
jovel_juan ▴ 30
@jovel_juan-7129
Last seen 17 months ago
Canada

I have an experiment with the following groups: one control (unstimulated T cells), and three experimental groups (T cells stimulated with TCR, with INFL3 or both).

I have compared the control against TCR stimulation alone or INFL3 alone, or TCR against TCR+INFL3 or INFL3 against TCR+INFL3. It implied creating 4 different counts and metadata files and running DESeq2 4 times.

Alternatively, I run all samples at once (in a single DESeq2 run) and extracted the desired contrasts with the code below:

groups <- unique(metadata$group)
possible_contrasts <- combn(groups, 2)
contrasts <- lapply(1:ncol(possible_contrasts), function(x){
  c('group', possible_contrasts[2,x], possible_contrasts[1,x])
})

selected_contrasts <- c(contrasts[1], contrasts[2], contrasts[5], contrasts[6])

for (i in 1:length(selected_contrasts)){
  contrast_args <- selected_contrasts[[i]]
  contrast_name <- paste(contrast_args, collapse='_vs_')
  contrast_name <- paste(contrast_args[1], paste(contrast_args[-1], collapse = "_vs_"), sep = "_")
  print(contrast_name)
  file_name  <- paste(contrast_name, '_q0.05.tsv', sep='')
  my_results <- results(ddsMat, contrast = contrast_args)
  resSig     <- subset(my_results, padj < 0.05)
  write.table(resSig, file_name, sep = '\t', quote = F, row.names = F)
}

When DE results are filtered using a corrected pValue (padj) threshold of 0.05, both approaches produce considerably different number or DE features.

Here the numbers:

With four individual DESeq2 runs (res <- results(dds):

11357 group_02_TCR_vs_01_control_q0.05.tsv

299 group_03_INFL3_vs_01_control_q0.05.tsv

304 group_04_TCR-INFL3_vs_02_TCR_q0.05.tsv

11127 group_04_TCR-INFL3_vs_03_INFL3_q0.05.tsv

With a single DESeq2 run extract contrasts with the code above:

9738 group_02_TCR_vs_01_control_q0.05.tsv

720 group_03_INFL3_vs_01_control_q0.05.tsv

528 group_04_TCR-INFL3_vs_02_TCR_q0.05.tsv

11612 group_04_TCR-INFL3_vs_03_INFL3_q0.05.tsv

I have not done any thorough comparison besides counting the number of DE features with each method. However, the difference in numbers of DE features worries me enough to believe that the normalization parameters derived from the whole dataset (of from individual subsets when individual DESeq2 runs) leads to different results.

Intuitively, I tend to believe that individual DESeq2 runs for individual subsets is more accurate because normalization is done on exactly the relevant samples.

Any input would be most welcome.

contrasts DESeq2 • 909 views
ADD COMMENT
1
Entering edit mode
ATpoint ★ 5.0k
@atpoint-13662
Last seen 4 hours ago
Germany

Please see the vignette, spitting samples or keeping together is covered there. This was asked many times before, use google or the search function to find them.

ADD COMMENT
0
Entering edit mode

Thank you! Apologies for the redundancy!

ADD REPLY

Login before adding your answer.

Traffic: 913 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6