hi Yoong,
On Thu, Jan 16, 2014 at 4:23 PM, Yoong [guest]
<guest@bioconductor.org>wrote:
>
> Hi there,
>
> I am currently making multiple comparisons using contrast in DESeq2.
I am
> interested in differential expression for genes underlying
germination
> mechanism due to high temperature. Here's my experimental design
> information:
>
> Genotypes: 4 different genotypes
> Timepoint: 3 different timepoints
> Temperature: Low and high temperatures
> 3 biological replicates for each condition.
>
> I have a few questions regarding contrast function in DESeq2
package. My
> questions are mainly based on the table (Recommended design formulae
for
> various experiments) in your package (Dec 23rd, 2013, page 11).
âThis section of the vignette was introduced in the devel branch
DESeq2
v1.3 and these recommendations are paired with this development
version, as
there are changes to the treatment of factors with 3 or more levels in
v1.3. I print the version number on the first page of the vignette so
users
won't accidentally mismatch code with a different version of software.
Please check your DESeq2 version by typing into R:
library(DESeq2)
sessionInfo()â
Please always paste the output of sessionInfo() into emails to the
Bioconductor list so we can provide you with the appropriate answers.
> I understand the terms 'condition', 'factor level', and 'group' are
being
> used vaguely for flexibility purpose. I just want to make sure I am
> interpreting the terms correctly based on my experimental design.
Here are
> my questions:
>
âIn this table I am using the term 'condition', 'group' and
'treatment'
just as hypothetical variables in colData(dds). They have no special
meaning though.
âAs I assume you are using the release version of DESeq2, version
1.2.x, I
will provide my recommendations below based on this. Firstly, we
recommend
you use the argument betaPrior=FALSE for version 1.2.x when you have
factors with 3 or more levels. So:
dds <- DESeq(dds, betaPrior=FALSE)â
âI will walk through this table, although not all the rows make
sense for
your dataset I think.â
>
> 1. >=3 level factor âconditionâ: compare levels against another
> ~condition, or ~group + condition.
> Am I correct to assume that I will be comparing different timepoints
for
> ONE genotype. For example, timepoints: 6hours, 12hours, and 24hours
after
> imbibition for Genotype A? Alternatively, I can also compare ONE
timepoint
> for four different genotypes. Am I right?
>
>
This row describes the following test: If you use the design, ~
genotype +
time + tempâ, then results(dds) called with no extra arguments will
provide
you with the test that the temperature has no effect on counts,
controlling
for the differences across genotypes and times. So to answer your
question:
no, it does not perform tests for only *one* genotype, or for only
*one*
timepoint, but it performs tests for a specific variable, controlling
for
the differences in counts which can be accounted for by *all* the
levels of
all the other variables. You can also use the contrast argument to
test
whether the log fold change of time point 'A' over 'B' is equal to
zero,
controlling for all differences which can be accounted for over all
temperatures and over all genotypes. Or you can use the contrast
argument
to test whether the log fold change of genotype 'A' over 'B' is equal
to
zero, controlling for all levels of time and temperature. I suppose
these
are not the tests you are interested in though, as this test doesn't
let
you examine differences in the effect of temperature for different
genotypes or different time points.
> 2. >=3 level factor âconditionâ: compare significance of all
levels
> ~condition, or ~group + condition.
> My interpretation is the same as above (#1). But, instead of
comparing
> gene counts, I will be comparing p=adjusted values?
>
>
âThis row describes likelihood ratio tests, which are a different
kind of
test than the default Wald tests performed by results(). There is no
difference however in the adjustment of p-values. Likelihood ratio
tests
compare a "full" design formula against a "reduced" design formula.
The
full formula is the one specified by design(dds). The reduced formula
is
provided by the user when running DESeq(). The likelihood ratio test
tests
whether the effects of the variable(s) which were removed from the
full
design in creating the reduced design are equal to zero. Again, I
suppose
this is not the test you are interested in, because this kind of test
does
not let you examine differences at different time points or for
different
genotypes.
> 3. 2 level factor âconditionâ but âgroupâ has >= 3 levels.
> Is it correct to assume that 'group'= genotypes (Genotype A, B, C,
and D).
> The level factor 'condition' is Low and High temperatures. So, for
this
> comparison, I will be comparing all four different genotypes for two
> different levels of temperatures (Low versus High). Am I correct?
>
>
âYou can ignore this row, it is describing the same test as row #1.
I will
delete this in fact as it seems confusing.
> 4. Interactions between âgroupâ and âtreatmentâ ~group +
treatment +
> group:treatment.
> For this, just as an example, I will be comparing Genotype A at
timepoint
> #1 with genotype B at timepoint #2?
>
>
âThis row describes tests of interactions, this is most likely the
kind of
test you are interested in running. I would recommend you use the
design:
~ genotype + time + temp + genotype:temp + time:temp
âAnd then call
resultsNames(dds)â
âIn order to see all the interactions which are available for
generating
tests. For example:
results(dds, name="genotypeA:tempHi")
...will provide you with the results of a test of whether the high
temperatureâ vs the low temperature has a specific effect for
genotype A,
over all time points.
and the callâ
results(dds, name="time2:tempHi")
...will provide you with the results of a test of whether the high
temperatureâ vs the low temperature has a specific effect for time2
over
time0, over all genotypes.
Meanwhile, the following call:
results(dds, name="tempHi")
...will provide you with the results of a test of whether the high
temperature vs the low temperature has an effect overall (over all
time
points and all genotypes).
I think it might be overkill to use third order interactions: whether
there
is an effect of high temp over low temp, specific for time1 and
genotype B,
for example, but this is possible as well with the design formula
~ genotype + time + temp + genotype:temp + time:tempâ +
genotype:time:temp
âand then generating these results using the 'name' argument of
results().â
> 5. Time series: changes due to treatment after time 0.
> For time series, I will be comparing changes in Genotype A at
timepoints
> #1,#2, and #3 due to High temperature? Am I correct?
>
>
âthis is the same as row 4, it is only phrased differently âas a
pointer
for people looking for key words.
Mike
> I apologize for my long questions. Thank you so much for your time
and
> input!
>
> Regards,
> Yoong
>
> -- output of sessionInfo():
>
> N/A.
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
[[alternative HTML version deleted]]