Design model for DESeq2 analysis
2
0
Entering edit mode
hs.lansdell ▴ 20
@hslansdell-14246
Last seen 6.5 years ago

Hello! 

I'm trying to figure out which design model is correct for my human RNA seq data. I have outcome data, here 'condition' that when examined alone, returns no differentially expressed genes (as design=~condition). So digging around led me to believe I need to control for the sex of the samples. I tried both of the below, and am unsure which is accurate... 

dds<-DESeqDataSetFromMatrix(countData = data, colData = colData, design= ~condition+sex)

Or

dds<-DESeqDataSetFromMatrix(countData = data, colData = colData, design= ~sex+condition)

The first gives me: 

> summary(res)

out of 20338 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 143, 0.7% 
LFC < 0 (down)   : 94, 0.46% 
outliers [1]     : 0, 0% 
low counts [2]   : 0, 0% 
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> sum(res$padj < 0.1, na.rm=TRUE)
[1] 237
> sum(res$padj<0.05,na.rm = TRUE)
[1] 169

The second gives me:

> summary(res)

out of 20338 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 0, 0% 
LFC < 0 (down)   : 0, 0% 
outliers [1]     : 0, 0% 
low counts [2]   : 0, 0% 
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> sum(res$padj < 0.1, na.rm=TRUE)
[1] 0

I've also run DESeq2 with just the sex and gotten: 

> summary(res)

out of 20338 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 173, 0.85% 
LFC < 0 (down)   : 135, 0.66% 
outliers [1]     : 0, 0% 
low counts [2]   : 7492, 37% 
(mean count < 19)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> sum(res$padj < 0.1, na.rm=TRUE)
[1] 308

So, I don't think the first is just the effect of the sex on the samples. Again, not sure which of the two models I should be using to control for sex, or if that's the right call for looking at outcome data at all.... 

deseq2 differential gene expression • 25k views
ADD COMMENT
1
Entering edit mode

Without showing us exactly how you got from dds to res, it's not possible to give much advice.  But I suspect that you're not specifying a contrast, and therefore you're letting DESeq chose the default contrast, which is the final term in your design.  So the ~ sex + condition model is the one that defaults to the thing you probably want.  But again, we have no details on what levels there are of 'condition', so it might not be corresponding to what you/we think it does.  Best to specify exactly which contrast you want in the call to DESeq (so that it's much less trouble to interpret when you/others come to review your analysis) rather than rely on defaults.

But as I imply, we really need more info on the setup.  When you say 'Outcome'  what types of outcome are you looking at.  If this is e.g. survival data, then we're in very different territory.

ADD REPLY
0
Entering edit mode

Sorry! It's a dichotomous outcome. So there's only yes or no. It's a calculated binary outcome based on reported pain 6 weeks out from an event. i.e, over a 7, gets a yes, under gets a no. 

Here's all my code: 

library(DESeq2)

data<-read.csv("TotalRNA.csv", header=TRUE, row.names = 1, stringsAsFactors = FALSE)

colnames(data) <- substring(colnames(data), 2)


colData<-read.csv("col_data.csv",header = TRUE, row.names = 1)

#Double check names match up between Sample and data matrix

all(rownames(colData)==colnames(data))

dds<-DESeqDataSetFromMatrix(countData = data, colData = colData, design= ~condition+sex)

dds$condition <- factor(dds$condition, levels = c("no","yes"))

dds<-DESeq(dds)

res<-results(dds)

plotMA(res,ylim=c(-2,2))

summary(res)

sum(res$padj < 0.1, na.rm=TRUE)

I guess what I don't understand about the contrast is that is seems to be only specified in 

results(dds, contrast=c("condition","C","B")), after DESeq2 has run.

My goal is really just to see if they are any differentially expressed genes between the two dichotomous outcome groups. I was getting none, which led me to try controlling for the sex of the sample as well. 

ADD REPLY
2
Entering edit mode
Gavin Kelly ▴ 680
@gavin-kelly-6944
Last seen 4.0 years ago
United Kingdom / London / Francis Crick…

Thanks for the extra info.

So, there's (at least) two stages to "DESeq2 has run" - the DESeq command gives you the potential to ask many questions via the results command (by selecting the relevant 'contrast' or 'name' argument), of which your code is defaulting to the comparison between sexes.  You can be more specific (and in this case, correct)  by saying 

results(dds, contrast=c("condition", "yes","no"))

It's a free choice as to whether you include sex as a factor: if you do, its effect will be 'normalised out' when estimating the condition effect; if you don't, it's effect (if there be any) will be included in an increased between-sample variability.  You may be guided by intuition, prior knowledge, or evidence from PCA and clustering, in this choice, but the contrast you use will be the same, regardless of your choice (unless you want to estimate separate per-sex condition effects).

ADD COMMENT
0
Entering edit mode

Thank you! I'm trying this now!

 

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

hi,

As Gavin says above, the crux of the confusion is that results(dds), if you don't provide with any 'name' or 'contrast' uses the last variable in the design. However, the designs above are equivalent, you just need to say which results table you want, using 'name'. See this paragraph here:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis

ADD COMMENT
0
Entering edit mode

Thanks for the input!

ADD REPLY

Login before adding your answer.

Traffic: 929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6