Question: Is my design appropriate for contrast in DESeq2?
gravatar for dbouzo
27 days ago by
dbouzo0 wrote:

Hi all,

I am a microbiology grad student new to bioinformatics, conducting some RNA-Seq experiments to determine differentially expressed genes after antimicrobial treatment.

I have an untreated control and 4 different antimicrobial treatments all conducted on the same cell type/microorganism with 3 biological replicates for each. The only factor changing between these groups is the treatment applied to them.  

Initially I determined differentially expressed genes for each treatment compared to the untreated control separately each with their own DESeqDataSet objects.  This made it difficult to compare DE between groups and visualise these as heat maps etc.  After some reading I generated one DESeqDataSet object which included all treatments, and then apply the contrasts argument to determine DEG for each treatment compared to the untreated control.

First I set the reference level:

dds$condition <- relevel(dds$condition, ref="untreated")

To determine differential expression:

dds <- DESeq(dds)
res_treatment1 <- results(dds, alpha=0.05, lfcThreshold = 1, altHypothesis="greaterAbs", contrast = c("condition", "treatment1", "untreated"))

The number of differentially expressed genes, outliers and low count genes were quite different between these two approaches despite using the same BAM files of alignments and same FDR and LFC thresholds.  

Despite reading the DESeq2 manual I was still unsure which approach was more appropriate - any advice is most welcome.  Thank you!

ADD COMMENTlink modified 27 days ago by Gavin Kelly510 • written 27 days ago by dbouzo0
gravatar for Gavin Kelly
27 days ago by
Gavin Kelly510
United Kingdom / London / Francis Crick Institute
Gavin Kelly510 wrote:

The difference will be that, in the single-dataset approach, you're estimating the variance (biological variability) by pooling the estimates within each treatment group.  This will give you greater power, so is generally the recommended approach.  Splitting the data into pairs of treatments will have less than half the number of degrees of freedom, so won't be as powerful, but will protect you from the unlikely issue that variance varies strongly between conditions  (and you want to capture that fact in your analysis).  Situations that merit this would be where there's a treatment group that the scientist has realised is un-interesting, but happens to have an outlier sample within it: even though you'd never be using that group of samples 'directly', it would still influence pairwise tests that didn't appear to involve it, by contributing an increased overall variability.

The vast majority of experiments I analyse are best done with all treatment groups included together (and comparisons pulled out with contrasts).  Yours looks as if it would fit into that pattern.

ADD COMMENTlink written 27 days ago by Gavin Kelly510
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 178 users visited in the last hour