Search
Question: Design model for DESeq2 analysis
0
8 months ago by
hs.lansdell10
hs.lansdell10 wrote:

Hello!

I'm trying to figure out which design model is correct for my human RNA seq data. I have outcome data, here 'condition' that when examined alone, returns no differentially expressed genes (as design=~condition). So digging around led me to believe I need to control for the sex of the samples. I tried both of the below, and am unsure which is accurate...

dds<-DESeqDataSetFromMatrix(countData = data, colData = colData, design= ~condition+sex)

Or

dds<-DESeqDataSetFromMatrix(countData = data, colData = colData, design= ~sex+condition)

The first gives me:

> summary(res)

out of 20338 with nonzero total read count
LFC > 0 (up)     : 143, 0.7%
LFC < 0 (down)   : 94, 0.46%
outliers [1]     : 0, 0%
low counts [2]   : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> sum(res$padj < 0.1, na.rm=TRUE) [1] 237 > sum(res$padj<0.05,na.rm = TRUE)
[1] 169

The second gives me:

> summary(res)

out of 20338 with nonzero total read count
LFC > 0 (up)     : 0, 0%
LFC < 0 (down)   : 0, 0%
outliers [1]     : 0, 0%
low counts [2]   : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> sum(res$padj < 0.1, na.rm=TRUE) [1] 0 I've also run DESeq2 with just the sex and gotten: > summary(res) out of 20338 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 173, 0.85% LFC < 0 (down) : 135, 0.66% outliers [1] : 0, 0% low counts [2] : 7492, 37% (mean count < 19) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results > sum(res$padj < 0.1, na.rm=TRUE)
[1] 308


So, I don't think the first is just the effect of the sex on the samples. Again, not sure which of the two models I should be using to control for sex, or if that's the right call for looking at outcome data at all....

modified 8 months ago by Gavin Kelly550 • written 8 months ago by hs.lansdell10

Without showing us exactly how you got from dds to res, it's not possible to give much advice.  But I suspect that you're not specifying a contrast, and therefore you're letting DESeq chose the default contrast, which is the final term in your design.  So the ~ sex + condition model is the one that defaults to the thing you probably want.  But again, we have no details on what levels there are of 'condition', so it might not be corresponding to what you/we think it does.  Best to specify exactly which contrast you want in the call to DESeq (so that it's much less trouble to interpret when you/others come to review your analysis) rather than rely on defaults.

But as I imply, we really need more info on the setup.  When you say 'Outcome'  what types of outcome are you looking at.  If this is e.g. survival data, then we're in very different territory.

Sorry! It's a dichotomous outcome. So there's only yes or no. It's a calculated binary outcome based on reported pain 6 weeks out from an event. i.e, over a 7, gets a yes, under gets a no.

Here's all my code:

library(DESeq2)

colnames(data) <- substring(colnames(data), 2)

#Double check names match up between Sample and data matrix

all(rownames(colData)==colnames(data))

dds<-DESeqDataSetFromMatrix(countData = data, colData = colData, design= ~condition+sex)

dds$condition <- factor(dds$condition, levels = c("no","yes"))

dds<-DESeq(dds)

res<-results(dds)

plotMA(res,ylim=c(-2,2))

summary(res)

sum(res\$padj < 0.1, na.rm=TRUE)

I guess what I don't understand about the contrast is that is seems to be only specified in

results(dds, contrast=c("condition","C","B")), after DESeq2 has run.

My goal is really just to see if they are any differentially expressed genes between the two dichotomous outcome groups. I was getting none, which led me to try controlling for the sex of the sample as well.

0
8 months ago by
Michael Love18k
United States
Michael Love18k wrote:

hi,

As Gavin says above, the crux of the confusion is that results(dds), if you don't provide with any 'name' or 'contrast' uses the last variable in the design. However, the designs above are equivalent, you just need to say which results table you want, using 'name'. See this paragraph here:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis

Thanks for the input!

0
8 months ago by
Gavin Kelly550
United Kingdom / London / Francis Crick Institute
Gavin Kelly550 wrote:

Thanks for the extra info.

So, there's (at least) two stages to "DESeq2 has run" - the DESeq command gives you the potential to ask many questions via the results command (by selecting the relevant 'contrast' or 'name' argument), of which your code is defaulting to the comparison between sexes.  You can be more specific (and in this case, correct)  by saying

results(dds, contrast=c("condition", "yes","no"))

It's a free choice as to whether you include sex as a factor: if you do, its effect will be 'normalised out' when estimating the condition effect; if you don't, it's effect (if there be any) will be included in an increased between-sample variability.  You may be guided by intuition, prior knowledge, or evidence from PCA and clustering, in this choice, but the contrast you use will be the same, regardless of your choice (unless you want to estimate separate per-sex condition effects).

Thank you! I'm trying this now!