Question

stimulation effects between wild-type and knockout cells using deseq

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 3 months ago

Germany

Hi everybody,

we have a data set of knock-out vs. wild-type samples, some of them were stimulated, and some didn't. the experimental design looks like that:

name	      condition  stimulation
Vav_KO_1      KO         no
Vav_KO_2      KO         no
Vav_KO_2_C    KO         yes
Vav_KO_4_C    KO         yes
Vav_KO_5      KO         no
Vav_KO_5_C    KO         yes
Vav_WT_1      wildtype	 no
Vav_WT_1_C    wildtype	 yes
Vav_WT_2      wildtype	 no
Vav_WT_2_C    wildtype	 yes
Vav_WT_4      wildtype	 no
Vav_WT_4_C    wildtype	 yes

We are interested in testing for two groups. The first one is for genes which are differentially regulated between the wildtype and the KO in genereal.

The second question is to look for genes that changed due to the stimulation between the two conditions (WT vs. KO).

my workflow and the used multi-factorial design is as followed:

dds <- DESeqDataSetFromMatrix(countData = countTable,
                              colData = Phenotype,
                              design = ~condition*stimulation)
dds <- DESeq(dds)
resultsNames(dds)

resultsWT.KO <- results(dds, contrast=c("condition", "trippleKO", "wildtype"))
resultsStimulation <- results(dds ,name = "conditionwildtype.stimulationnone")

For the first comparison I find over 7200 genes with an adjusted p-value below 0.1, but with the parameters I don't find any differentially regulated genes for the second problem.

I would like to know if construction of my design matrix is correct for this question.

thanks

Assa

multiple factor design interactions design matrix deseq2 • 1.5k views

ADD COMMENT • link updated 6.5 years ago by Michael Love 41k • written 8.4 years ago by Assa Yeroslaviz ★ 1.5k

score 0 · Answer 1 · 2015-11-18

A follow-up question - I have changed the order of my phenotype file to this

name    condition            stimulation
Vav_KO_1        KO            no
Vav_KO_2        KO            no
Vav_KO_5        KO            no
Vav_KO_2_C        KO            yes
Vav_KO_4_C        KO            yes
Vav_KO_5_C        KO            yes
Vav_WT_1        wildtype    no
Vav_WT_2        wildtype    no
Vav_WT_4        wildtype    no
Vav_WT_4_C        wildtype    yes
Vav_WT_1_C        wildtype    yes
Vav_WT_2_C        wildtype    yes

and ran everything again, without changing the script. Now I get different possbile comparisons:

> resultsNames(dds)
[1] "Intercept"                        "condition_wildtype_vs_KO"        
[3] "stimulation_yes_vs_no"            "conditionwildtype.stimulationyes"

and when I now chacke for DE genes with adjp<=0.1 I get

> table(resultsStimulation$padj<=0.1)

FALSE  TRUE
16103   106

Is there an explanation for such behaviour?

I would like to know why changing the order of the order of sample information changes the results.

thanks

Assa

score 0 · Answer 2 · 2015-11-18

If you build a DESeqDataSet from a count matrix and a colData table, you are implying that column 1 of the count matrix is described by row 1 of the colData, etc. You need to be very careful and make sure that the order of colData is correct and corresponds to the count matrix. You need to discuss with a local statistician on how interaction models work and how to interpret the results tables here. The first result table is not ko vs wt "in general" but it is ko vs wt for the reference level of stimulated which looks like is "yes" here. You can also read the vignette of the current DESeq2 version (1.10) for more help on interaction terms, but beyond that I highly recommend discussing this with a local statistician who can explain interaction models.