Inquiries on recommended design formulae for various experiments
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hi there, I am currently making multiple comparisons using contrast in DESeq2. I am interested in differential expression for genes underlying germination mechanism due to high temperature. Here's my experimental design information: Genotypes: 4 different genotypes Timepoint: 3 different timepoints Temperature: Low and high temperatures 3 biological replicates for each condition. I have a few questions regarding contrast function in DESeq2 package. My questions are mainly based on the table (Recommended design formulae for various experiments) in your package (Dec 23rd, 2013, page 11). I understand the terms 'condition', 'factor level', and 'group' are being used vaguely for flexibility purpose. I just want to make sure I am interpreting the terms correctly based on my experimental design. Here are my questions: 1. >=3 level factor ???condition???: compare levels against another ~condition, or ~group + condition. Am I correct to assume that I will be comparing different timepoints for ONE genotype. For example, timepoints: 6hours, 12hours, and 24hours after imbibition for Genotype A? Alternatively, I can also compare ONE timepoint for four different genotypes. Am I right? 2. >=3 level factor ???condition???: compare significance of all levels ~condition, or ~group + condition. My interpretation is the same as above (#1). But, instead of comparing gene counts, I will be comparing p=adjusted values? 3. 2 level factor ???condition??? but ???group??? has >= 3 levels. Is it correct to assume that 'group'= genotypes (Genotype A, B, C, and D). The level factor 'condition' is Low and High temperatures. So, for this comparison, I will be comparing all four different genotypes for two different levels of temperatures (Low versus High). Am I correct? 4. Interactions between ???group??? and ???treatment??? ~group + treatment + group:treatment. For this, just as an example, I will be comparing Genotype A at timepoint #1 with genotype B at timepoint #2? 5. Time series: changes due to treatment after time 0. For time series, I will be comparing changes in Genotype A at timepoints #1,#2, and #3 due to High temperature? Am I correct? I apologize for my long questions. Thank you so much for your time and input! Regards, Yoong -- output of sessionInfo(): N/A. -- Sent via the guest posting facility at bioconductor.org.
DESeq2 DESeq2 • 1.1k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 55 minutes ago
United States
hi Yoong, On Thu, Jan 16, 2014 at 4:23 PM, Yoong [guest] <guest@bioconductor.org>wrote: > > Hi there, > > I am currently making multiple comparisons using contrast in DESeq2. I am > interested in differential expression for genes underlying germination > mechanism due to high temperature. Here's my experimental design > information: > > Genotypes: 4 different genotypes > Timepoint: 3 different timepoints > Temperature: Low and high temperatures > 3 biological replicates for each condition. > > I have a few questions regarding contrast function in DESeq2 package. My > questions are mainly based on the table (Recommended design formulae for > various experiments) in your package (Dec 23rd, 2013, page 11). ​This section of the vignette was introduced in the devel branch DESeq2 v1.3 and these recommendations are paired with this development version, as there are changes to the treatment of factors with 3 or more levels in v1.3. I print the version number on the first page of the vignette so users won't accidentally mismatch code with a different version of software. Please check your DESeq2 version by typing into R: library(DESeq2) sessionInfo()​ Please always paste the output of sessionInfo() into emails to the Bioconductor list so we can provide you with the appropriate answers. > I understand the terms 'condition', 'factor level', and 'group' are being > used vaguely for flexibility purpose. I just want to make sure I am > interpreting the terms correctly based on my experimental design. Here are > my questions: > ​In this table I am using the term 'condition', 'group' and 'treatment' just as hypothetical variables in colData(dds). They have no special meaning though. ​As I assume you are using the release version of DESeq2, version 1.2.x, I will provide my recommendations below based on this. Firstly, we recommend you use the argument betaPrior=FALSE for version 1.2.x when you have factors with 3 or more levels. So: dds <- DESeq(dds, betaPrior=FALSE)​ ​I will walk through this table, although not all the rows make sense for your dataset I think.​ > > 1. >=3 level factor ’condition’: compare levels against another > ~condition, or ~group + condition. > Am I correct to assume that I will be comparing different timepoints for > ONE genotype. For example, timepoints: 6hours, 12hours, and 24hours after > imbibition for Genotype A? Alternatively, I can also compare ONE timepoint > for four different genotypes. Am I right? > > This row describes the following test: If you use the design, ~ genotype + time + temp​, then results(dds) called with no extra arguments will provide you with the test that the temperature has no effect on counts, controlling for the differences across genotypes and times. So to answer your question: no, it does not perform tests for only *one* genotype, or for only *one* timepoint, but it performs tests for a specific variable, controlling for the differences in counts which can be accounted for by *all* the levels of all the other variables. You can also use the contrast argument to test whether the log fold change of time point 'A' over 'B' is equal to zero, controlling for all differences which can be accounted for over all temperatures and over all genotypes. Or you can use the contrast argument to test whether the log fold change of genotype 'A' over 'B' is equal to zero, controlling for all levels of time and temperature. I suppose these are not the tests you are interested in though, as this test doesn't let you examine differences in the effect of temperature for different genotypes or different time points. > 2. >=3 level factor ’condition’: compare significance of all levels > ~condition, or ~group + condition. > My interpretation is the same as above (#1). But, instead of comparing > gene counts, I will be comparing p=adjusted values? > > ​This row describes likelihood ratio tests, which are a different kind of test than the default Wald tests performed by results(). There is no difference however in the adjustment of p-values. Likelihood ratio tests compare a "full" design formula against a "reduced" design formula. The full formula is the one specified by design(dds). The reduced formula is provided by the user when running DESeq(). The likelihood ratio test tests whether the effects of the variable(s) which were removed from the full design in creating the reduced design are equal to zero. Again, I suppose this is not the test you are interested in, because this kind of test does not let you examine differences at different time points or for different genotypes. > 3. 2 level factor ’condition’ but ’group’ has >= 3 levels. > Is it correct to assume that 'group'= genotypes (Genotype A, B, C, and D). > The level factor 'condition' is Low and High temperatures. So, for this > comparison, I will be comparing all four different genotypes for two > different levels of temperatures (Low versus High). Am I correct? > > ​You can ignore this row, it is describing the same test as row #1. I will delete this in fact as it seems confusing. > 4. Interactions between ’group’ and ’treatment’ ~group + treatment + > group:treatment. > For this, just as an example, I will be comparing Genotype A at timepoint > #1 with genotype B at timepoint #2? > > ​This row describes tests of interactions, this is most likely the kind of test you are interested in running. I would recommend you use the design: ~ genotype + time + temp + genotype:temp + time:temp ​And then call resultsNames(dds)​ ​In order to see all the interactions which are available for generating tests. For example: results(dds, name="genotypeA:tempHi") ...will provide you with the results of a test of whether the high temperature​ vs the low temperature has a specific effect for genotype A, over all time points. and the call​ results(dds, name="time2:tempHi") ...will provide you with the results of a test of whether the high temperature​ vs the low temperature has a specific effect for time2 over time0, over all genotypes. Meanwhile, the following call: results(dds, name="tempHi") ...will provide you with the results of a test of whether the high temperature vs the low temperature has an effect overall (over all time points and all genotypes). I think it might be overkill to use third order interactions: whether there is an effect of high temp over low temp, specific for time1 and genotype B, for example, but this is possible as well with the design formula ~ genotype + time + temp + genotype:temp + time:temp​ + genotype:time:temp ​and then generating these results using the 'name' argument of results().​ > 5. Time series: changes due to treatment after time 0. > For time series, I will be comparing changes in Genotype A at timepoints > #1,#2, and #3 due to High temperature? Am I correct? > > ​this is the same as row 4, it is only phrased differently ​as a pointer for people looking for key words. Mike > I apologize for my long questions. Thank you so much for your time and > input! > > Regards, > Yoong > > -- output of sessionInfo(): > > N/A. > > -- > Sent via the guest posting facility at bioconductor.org. > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6