Question: DESeq2 test over all timepoints?
0
gravatar for Hickman, R.J. Richard
5.3 years ago by
Hickman, R.J. Richard50 wrote:
Dear Mike (and other DESeq2 developers/users), I came across this thread regarding the use of the GLM functions in DESeq2 with respect to time series. However, if I try and perform the analysis you describe then I get an error when estimating the dispersions? do you know what is the cause and/or what I could be doing wrong? dds <- DESeqDataSetFromMatrix( countData = countMatrix, colData = colData, design = ~ time + treatment + treatment:time) dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) gene-wise dispersion estimates Error in fitDisp(ySEXP = counts(objectNZ), xSEXP = fit$modelMatrix, mu_hatSEXP = fit$mu_hat, : in call to fitDisp, the following arguments contain NA: mu_hatSEXP The matrix of counts and the colData are OK, i think.. This is the colData df used: treatment time s1 treated T1 s2 treated T1 s3 treated T1 s4 untreated T1 s5 untreated T1 s6 untreated T1 s7 treated T2 s8 treated T2 s9 treated T2 s10 untreated T2 s11 untreated T2 s12 untreated T2 Bests, Richard # Session info: > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] grid parallel stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_0.9.3.1 gplots_2.11.0 MASS_7.3-26 KernSmooth_2.23-10 [5] caTools_1.14 gdata_2.12.0 gtools_2.7.1 reshape_0.8.4 [9] plyr_1.8 pasilla_0.2.16 DESeq_1.12.0 locfit_1.5-9 [13] DEXSeq_1.6.0 parathyroidSE_0.99.5 DESeq2_1.0.9 RcppArmadillo_0.3.810.2 [17] Rcpp_0.10.3 lattice_0.20-15 Biobase_2.20.0 GenomicRanges_1.12.2 [21] IRanges_1.18.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.3 biomaRt_2.16.0 Biostrings_2.28.0 bitops_1.0-5 [6] colorspace_1.2-2 DBI_0.2-6 dichromat_2.0-0 digest_0.6.3 genefilter_1.42.0 [11] geneplotter_1.38.0 gtable_0.1.2 hwriter_1.3 labeling_0.1 munsell_0.4 [16] proto_0.3-10 RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2 Rsamtools_1.12.2 [21] RSQLite_0.11.3 scales_0.2.3 splines_3.0.0 statmod_1.4.17 stats4_3.0.0 [26] stringr_0.6.2 survival_2.37-4 tools_3.0.0 XML_3.95-0.2 xtable_1.7-1 [31] zlibbioc_1.6.0 >hi Charles, >>On Tue, Jul 9, 2013 at 3:59 PM, Charles Determan Jr wrote: >>Greetings, >>I have used the DESeq package previously and have been recently using >>DESeq2. I am particularly interested in repeated measures designs and was >>wondering about applications with DESeq2. I have read through the manual >>and tried searching the archives but couldn't find too much direction for >>testing over all timepoints instead of just one at a time (ANOVA- like >>approach). Reading the edgeR manual, it provides an example in section >>3.3.4 that tests whether a treatment has an effect at any time by taking >>multiple coefficients (i.e. lrt <- glmLRT(fit, coef=5:6)). I attempted >>something similar with DESeq2: >>res <- results(dds, name=resultsNames(dds)[5:6] >>but I got the warning message saying only the first element used: >>Warning message:In if (paste0("WaldPvalue_", name) %in% >>names(mcols(object))) { : >>the condition has length > 1 and only the first element will be used >I should clean up the code to provide a warning here, as the results() >function should only accept a character vector of length 1 for the >argument 'name'. >The proper way to test for the significance of multiple coefficients >at once is to use the nbinomLRT() function in DESeq2 and specify a >reduced formula. To test whether the treatment effect at all times is >different than at the baseline time, the reduced formula would remove >the interaction term between treatment and time, so: >design(dds) <- formula(~ time + treatment + treatment:time) >dds <- estimateSizeFactors(dds) >dds <- estimateDispersions(dds) >dds <- nbinomLRT(dds, reduced = formula(~ time + treatment)) >res <- results(dds) >If you presume that the treatment effect is the same at all times, you >can test whether the treatment effect is equal to 0 with: ># using the Wald test and coefficient shrinkage >design(dds) <- formula(~ time + treatment) >dds <- DESeq(dds) >res <- results(dds) ># or using the likelihood ratio test as in the previous example >design(dds) <- formula(~ time + treatment) >dds <- estimateSizeFactors(dds) >dds <- estimateDispersions(dds) >dds <- nbinomLRT(dds, reduced = formula(~ time)) >res <- results(dds) >The main difference here between the Wald and LRT tests is the >shrinkage of estimated log2 fold changes to 0 using the default >DESeq() function/Wald test. >I will add more examples to the vignette to better explain these cases >of testing multiple coefficients. >Mike
edger deseq deseq2 • 1.8k views
ADD COMMENTlink modified 5.3 years ago by Michael Love24k • written 5.3 years ago by Hickman, R.J. Richard50
Answer: DESeq2 test over all timepoints?
0
gravatar for Michael Love
5.3 years ago by
Michael Love24k
United States
Michael Love24k wrote:
Hi Richard, Can you upgrade to the release version of DESeq2 (v1.2)? If this is not possible try the latest version from v1.0. The version you are using is from the first months DESeq2 was released. Mike On Feb 12, 2014 7:27 AM, "Hickman, R.J. (Richard)" <r.j.hickman@uu.nl> wrote: > Dear Mike (and other DESeq2 developers/users), > > I came across this thread regarding the use of the GLM functions in DESeq2 > with respect to time series. However, if I try and perform the analysis you > describe then I get an error when estimating the dispersions— do you know > what is the cause and/or what I could be doing wrong? > > dds <- DESeqDataSetFromMatrix( countData = countMatrix, colData = colData, > design = ~ time + treatment + treatment:time) > dds <- estimateSizeFactors(dds) > dds <- estimateDispersions(dds) > gene-wise dispersion estimates > Error in fitDisp(ySEXP = counts(objectNZ), xSEXP = fit$modelMatrix, > mu_hatSEXP = fit$mu_hat, : > in call to fitDisp, the following arguments contain NA: mu_hatSEXP > > The matrix of counts and the colData are OK, i think.. > > This is the colData df used: > > treatment time > s1 treated T1 > s2 treated T1 > s3 treated T1 > s4 untreated T1 > s5 untreated T1 > s6 untreated T1 > s7 treated T2 > s8 treated T2 > s9 treated T2 > s10 untreated T2 > s11 untreated T2 > s12 untreated T2 > > Bests, > > Richard > > # Session info: > > > sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] grid parallel stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] ggplot2_0.9.3.1 gplots_2.11.0 MASS_7.3-26 > KernSmooth_2.23-10 > [5] caTools_1.14 gdata_2.12.0 gtools_2.7.1 > reshape_0.8.4 > [9] plyr_1.8 pasilla_0.2.16 DESeq_1.12.0 > locfit_1.5-9 > [13] DEXSeq_1.6.0 parathyroidSE_0.99.5 DESeq2_1.0.9 > RcppArmadillo_0.3.810.2 > [17] Rcpp_0.10.3 lattice_0.20-15 Biobase_2.20.0 > GenomicRanges_1.12.2 > [21] IRanges_1.18.0 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] annotate_1.38.0 AnnotationDbi_1.22.3 biomaRt_2.16.0 > Biostrings_2.28.0 bitops_1.0-5 > [6] colorspace_1.2-2 DBI_0.2-6 dichromat_2.0-0 > digest_0.6.3 genefilter_1.42.0 > [11] geneplotter_1.38.0 gtable_0.1.2 hwriter_1.3 > labeling_0.1 munsell_0.4 > [16] proto_0.3-10 RColorBrewer_1.0-5 RCurl_1.95-4.1 > reshape2_1.2.2 Rsamtools_1.12.2 > [21] RSQLite_0.11.3 scales_0.2.3 splines_3.0.0 > statmod_1.4.17 stats4_3.0.0 > [26] stringr_0.6.2 survival_2.37-4 tools_3.0.0 > XML_3.95-0.2 xtable_1.7-1 > [31] zlibbioc_1.6.0 > > > >hi Charles, > > >>On Tue, Jul 9, 2013 at 3:59 PM, Charles Determan Jr wrote: > >>Greetings, > > >>I have used the DESeq package previously and have been recently using > >>DESeq2. I am particularly interested in repeated measures designs and was > >>wondering about applications with DESeq2. I have read through the manual > >>and tried searching the archives but couldn't find too much direction for > >>testing over all timepoints instead of just one at a time (ANOVA- like > >>approach). Reading the edgeR manual, it provides an example in section > >>3.3.4 that tests whether a treatment has an effect at any time by taking > >>multiple coefficients (i.e. lrt <- glmLRT(fit, coef=5:6)). I attempted > >>something similar with DESeq2: > > >>res <- results(dds, name=resultsNames(dds)[5:6] > > >>but I got the warning message saying only the first element used: > > >>Warning message:In if (paste0("WaldPvalue_", name) %in% > >>names(mcols(object))) { : > >>the condition has length > 1 and only the first element will be used > > >I should clean up the code to provide a warning here, as the results() > >function should only accept a character vector of length 1 for the > >argument 'name'. > > > >The proper way to test for the significance of multiple coefficients > >at once is to use the nbinomLRT() function in DESeq2 and specify a > >reduced formula. To test whether the treatment effect at all times is > >different than at the baseline time, the reduced formula would remove > >the interaction term between treatment and time, so: > > > >design(dds) <- formula(~ time + treatment + treatment:time) > >dds <- estimateSizeFactors(dds) > >dds <- estimateDispersions(dds) > >dds <- nbinomLRT(dds, reduced = formula(~ time + treatment)) > >res <- results(dds) > > > >If you presume that the treatment effect is the same at all times, you > >can test whether the treatment effect is equal to 0 with: > > > ># using the Wald test and coefficient shrinkage > >design(dds) <- formula(~ time + treatment) > >dds <- DESeq(dds) > >res <- results(dds) > > > ># or using the likelihood ratio test as in the previous example > >design(dds) <- formula(~ time + treatment) > >dds <- estimateSizeFactors(dds) > >dds <- estimateDispersions(dds) > >dds <- nbinomLRT(dds, reduced = formula(~ time)) > >res <- results(dds) > > > >The main difference here between the Wald and LRT tests is the > >shrinkage of estimated log2 fold changes to 0 using the default > >DESeq() function/Wald test. > > > >I will add more examples to the vignette to better explain these cases > >of testing multiple coefficients. > > > >Mike > > > [[alternative HTML version deleted]]
ADD COMMENTlink written 5.3 years ago by Michael Love24k
Hi Mike, That appears to have worked- thanks! Richard ________________________________ From: Michael Love [michaelisaiahlove@gmail.com] Sent: 12 February 2014 14:53 To: Hickman, R.J. (Richard) Cc: bioconductor@r-project.org Subject: Re: [BioC] DESeq2 test over all timepoints? Hi Richard, Can you upgrade to the release version of DESeq2 (v1.2)? If this is not possible try the latest version from v1.0. The version you are using is from the first months DESeq2 was released. Mike On Feb 12, 2014 7:27 AM, "Hickman, R.J. (Richard)" <r.j.hickman@uu.nl<mailto:r.j.hickman@uu.nl>> wrote: Dear Mike (and other DESeq2 developers/users), I came across this thread regarding the use of the GLM functions in DESeq2 with respect to time series. However, if I try and perform the analysis you describe then I get an error when estimating the dispersions� do you know what is the cause and/or what I could be doing wrong? dds <- DESeqDataSetFromMatrix( countData = countMatrix, colData = colData, design = ~ time + treatment + treatment:time) dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) gene-wise dispersion estimates Error in fitDisp(ySEXP = counts(objectNZ), xSEXP = fit$modelMatrix, mu_hatSEXP = fit$mu_hat, : in call to fitDisp, the following arguments contain NA: mu_hatSEXP The matrix of counts and the colData are OK, i think.. This is the colData df used: treatment time s1 treated T1 s2 treated T1 s3 treated T1 s4 untreated T1 s5 untreated T1 s6 untreated T1 s7 treated T2 s8 treated T2 s9 treated T2 s10 untreated T2 s11 untreated T2 s12 untreated T2 Bests, Richard # Session info: > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] grid parallel stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_0.9.3.1 gplots_2.11.0 MASS_7.3-26 KernSmooth_2.23-10 [5] caTools_1.14 gdata_2.12.0 gtools_2.7.1 reshape_0.8.4 [9] plyr_1.8 pasilla_0.2.16 DESeq_1.12.0 locfit_1.5-9 [13] DEXSeq_1.6.0 parathyroidSE_0.99.5 DESeq2_1.0.9 RcppArmadillo_0.3.810.2 [17] Rcpp_0.10.3 lattice_0.20-15 Biobase_2.20.0 GenomicRanges_1.12.2 [21] IRanges_1.18.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.3 biomaRt_2.16.0 Biostrings_2.28.0 bitops_1.0-5 [6] colorspace_1.2-2 DBI_0.2-6 dichromat_2.0-0 digest_0.6.3 genefilter_1.42.0 [11] geneplotter_1.38.0 gtable_0.1.2 hwriter_1.3 labeling_0.1 munsell_0.4 [16] proto_0.3-10 RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2 Rsamtools_1.12.2 [21] RSQLite_0.11.3 scales_0.2.3 splines_3.0.0 statmod_1.4.17 stats4_3.0.0 [26] stringr_0.6.2 survival_2.37-4 tools_3.0.0 XML_3.95-0.2 xtable_1.7-1 [31] zlibbioc_1.6.0 >hi Charles, >>On Tue, Jul 9, 2013 at 3:59 PM, Charles Determan Jr wrote: >>Greetings, >>I have used the DESeq package previously and have been recently using >>DESeq2. I am particularly interested in repeated measures designs and was >>wondering about applications with DESeq2. I have read through the manual >>and tried searching the archives but couldn't find too much direction for >>testing over all timepoints instead of just one at a time (ANOVA- like >>approach). Reading the edgeR manual, it provides an example in section >>3.3.4 that tests whether a treatment has an effect at any time by taking >>multiple coefficients (i.e. lrt <- glmLRT(fit, coef=5:6)). I attempted >>something similar with DESeq2: >>res <- results(dds, name=resultsNames(dds)[5:6] >>but I got the warning message saying only the first element used: >>Warning message:In if (paste0("WaldPvalue_", name) %in% >>names(mcols(object))) { : >>the condition has length > 1 and only the first element will be used >I should clean up the code to provide a warning here, as the results() >function should only accept a character vector of length 1 for the >argument 'name'. >The proper way to test for the significance of multiple coefficients >at once is to use the nbinomLRT() function in DESeq2 and specify a >reduced formula. To test whether the treatment effect at all times is >different than at the baseline time, the reduced formula would remove >the interaction term between treatment and time, so: >design(dds) <- formula(~ time + treatment + treatment:time) >dds <- estimateSizeFactors(dds) >dds <- estimateDispersions(dds) >dds <- nbinomLRT(dds, reduced = formula(~ time + treatment)) >res <- results(dds) >If you presume that the treatment effect is the same at all times, you >can test whether the treatment effect is equal to 0 with: ># using the Wald test and coefficient shrinkage >design(dds) <- formula(~ time + treatment) >dds <- DESeq(dds) >res <- results(dds) ># or using the likelihood ratio test as in the previous example >design(dds) <- formula(~ time + treatment) >dds <- estimateSizeFactors(dds) >dds <- estimateDispersions(dds) >dds <- nbinomLRT(dds, reduced = formula(~ time)) >res <- results(dds) >The main difference here between the Wald and LRT tests is the >shrinkage of estimated log2 fold changes to 0 using the default >DESeq() function/Wald test. >I will add more examples to the vignette to better explain these cases >of testing multiple coefficients. >Mike [[alternative HTML version deleted]]
ADD REPLYlink written 5.3 years ago by Hickman, R.J. Richard50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 297 users visited in the last hour