Question: Using DESeq2: experimental design and extracting results

1

bio2001 •

**60**wrote:Hi Michael,
I'd like your suggestions on an additional question related to this
previous thread on DESeq2.
You might be aware that when publishing differential expression
results,
it is a convention to provide the mean and standard deviation values
(after the normalization) for the genes being tested. The DE table
output output by DESeq has a column "baseMean". Is this equivalent to
the mean expression? Is there a way to get standard deviation? Can the
mean and SD be obtained separately for both the treatments being
compared?
Thanks
Sridhar Acharya
On 6/11/2014 10:32 AM, Michael Love wrote:
> hi Sridhar,
>
>
>
> On Wed, Jun 11, 2014 at 9:51 AM, Sridhar A Malkaram
> <smalkaram at="" wvstateu.edu=""> wrote:
>> Hi Michael,
>>
>> Thanks for explaining in detail.
>> Just for clarification, I think we should add the "~" before the
>> "genotype + time", right?
> Yes good catch. I should have written "~ genotype + time"
>
>> Does "~" stand for formula notation in R?
> Yes.
>
>> dds <- DESeq(dds, test="LRT", reduced=genotype + time)
>>
>>
>> Also, when I do,
>>
>> dds <- nbinomWaldTest(dds, betaPrior=FALSE)
>> res<-results(dds, name="time2.gen2")
>> head(res)
>>
>> The first two lines of the output:
>> log2 fold change: time2.gen2
>> Wald test p-value: time2.gen2
>> .....
>>
>> What two components is the log2 fold change between? I mean, how do
I read time2.gen2 ?
>> How can I get the two expression values that are being compared.
The older version of DESeq used to give these values in the results.
>>
> The time series model with interactions between time and a condition
> variable is a bit more complicated than the simple group
comparisons.
> There are not simply two levels being compared. The way that
> interaction effects are encoded in model matrices in R is to write
the
> two terms of which the interaction is composed next to each other.
We
> use this convention as well. I'll try to explain the meaning of the
> model you have specified. The model fits a number of main effects:
>
> Firstly, we have the change in expression levels over time for the
> base level of the condition variable (so gen1). These effects are
the
> fold change (or addition in the log2 space) of the time 2 over time1
> (the intercept, or base level), time 3 over time1, time 4 over time
1.
> If you specify results(dds, name="time_2_vs_1") you would get a
> results table with a simple description of the comparison. But this
is
> probably not of interest for you for testing, as it doesn't tell you
> anything about the effect of the experimental condition of interest,
> e.g. the genotype.
>
> Secondly, we have the fold change of genotype 2 over 1 for the first
> time period. This is also obtained with results(dds,
> name="gen_2_vs_1")
>
> Finally are the interaction terms. For example, "gen2.time2" is the
> fold change for genotype 2 at time 2 which is not explained by the
> genotype 2 main fold change and the time 2 main fold change
multiplied
> together (or added in the log2 scale). If the log fold change of
this
> intercept term equals 0, then we say there is no interaction. In
other
> words, the effect of genotype 2 is the same at time 2 as it was at
> time 1, after accounting for the general effect of time on
expression.
>
> Mike
>
>> -- With Kind Regards,
>> Sridhar Acharya
>>
>>
>>
>>
>>
>>
>>
>> On 6/10/2014 8:32 PM, Michael Love wrote:
>>> I left out one detail, below:
>>>
>>> On Tue, Jun 10, 2014 at 8:20 PM, Michael Love
>>> <michaelisaiahlove at="" gmail.com=""> wrote:
>>>> hi Sridhar,
>>>>
>>>> Let's keep the discussion on the mailing list in case the
question is
>>>> relevant to others.
>>>>
>>>> After you have run:
>>>>
>>>> design(dds) <- ~ genotype + time + genotype:time
>>>> dds <- DESeq(dds, test="LRT", reduced=genotype + time)
>>>> res <- results(dds)
>>>>
>>>> The res object will contain the likelihood ratio test results,
with
>>>> small p-values for genes which have a genotype effect which is
>>>> different than in the first time period. This tests all time
periods
>>>> after the first time period.
>>>>
>>>> You also see:
>>>>
>>>> resultsNames(ddsLRT)
>>>> Intercept time2_vs_time1 time3_vs_time1 time4_vs_time1
>>>> gen2_vs_gen1 time2.gen2 time3.gen2 time4.gen2
>>>>
>>>> The three of these which might be interesting for your experiment
are:
>>>>
>>> Before the results line below you should call:
>>>
>>> dds <- nbinomWaldTest(dds, betaPrior=FALSE)
>>>
>>>> results(dds, name="time2.gen2")
>>>> ...and same for time3.gen2, time4.gen2
>>>>
>>>> which will return a results table with Wald tests of the
additional
>>>> genotype effect in time 2 (additional beyond the genotype effect
in
>>>> the first time period). This is similar to the first LRT results
>>>> above, except now we are asking for a different effect of
genotype in
>>>> a specific time period, not in all time periods.
>>>>
>>>> The other coefficients are the main effect terms. Results tables
for
>>>> these can also be built by using the 'name' argument to
results().
>>>> They are the intercept term, the effects of the different times
over
>>>> the initial time, and the effect for genotype 2 over 1 in the
first
>>>> time period. You don't want to use the contrast argument, which
is for
>>>> other kinds of models.
>>>>
>>>> Mike
>>>>
>>>> On Tue, Jun 10, 2014 at 3:38 PM, Michael Love
>>>> <michaelisaiahlove at="" gmail.com=""> wrote:
>>>>> hi Sridhar,
>>>>>
>>>>> On Tue, Jun 10, 2014 at 3:05 PM, Sridhar A Malkaram
>>>>> <smalkaram at="" wvstateu.edu=""> wrote:
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> I have been a user of DESeq and recently DESeq2 for my research
work.
>>>>>> The latest DESeq2 seem to offer extensive differential testing
options
>>>>>> suitable for various experimental designs.
>>>>>>
>>>>>> Recently I wanted to use DESeq for a differential gene
expression
>>>>>> analysis between two plant genotypes across 4 different time
points.
>>>>>>
>>>>>> I am basically a biologist and am finding hard to grasp the
concepts of
>>>>>> testing results. I'd be very grateful if you could help me
understand
>>>>>> some concepts (especially resultsNames) related to the DESeq2
package.
>>>>>>
>>>>>>
>>>>>> My experimental design is as below
>>>>>>
>>>>>> design<- ~ genotype + time + genotype:time
>>>>>>
>>>>>> There are two levels in genotype and 4 levels in time.
>>>>>> Basically I'd like to use binomLRT test to check if there is
any
>>>>>> difference in gene expression between the genotypes across the
time points.
>>>>>>
>>>>>> dds<-DESeq(dds) (dds is DESeq2 object obtained from,
>>>>>> dds<-DESeqDataSetFromMatrix(countData=counts, colData=coldata,
>>>>>> design=design)
>>>>>>
>>>>>> and I am using the reduced model for the liklihood test
>>>>>>
>>>>> Here is where things are getting confused. You have already run
>>>>> DESeq() using test="Wald". So it doesn't make sense at this
point to
>>>>> instead perform a likelihood ratio test. In our vignette we
explain
>>>>> this in the section on the LRT: "The likelihood ratio test can
also be
>>>>> speci?ed using the test argument to DESeq, which substitutes
>>>>> nbinomWaldTest with nbinomLRT."
>>>>>
>>>>>> Is the model correct per my research question (is there a (time
>>>>>> influenced) difference between genotypes)?
>>>>>>
>>>>> Yes. If you want to find those genes which show a time
influenced
>>>>> difference between genotypes, this is simply:
>>>>>
>>>>> dds <- DESeq(dds, test="LRT", reduced=genotype + time)
>>>>> res <- results(dds)
>>>>>
>>>>> You can then use heatmaps to inspect the patterns of gene
expression
>>>>> for the differentially expressed genes. Visualization with
heatmaps
>>>>> are covered in the vignette.
>>>>>
>>>>> If you have other more specific questions about how to generate
>>>>> results tables, I can answer them. With time series experiments,
there
>>>>> are many possible combinations to test, but rather than going
through
>>>>> all combinations, we recommend that users explore the results
with
>>>>> heatmaps.
>>>>>
>>>>> Mike
>>>>>
>>>>>> Thanks,
>>>>>> Sridhar Acharya
>>>>>>
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT
• link
•
modified 5.3 years ago
by
Michael Love ♦

**25k**• written 5.3 years ago by bio2001 •**60**