DESeq: multi-factors testing questions
1
0
Entering edit mode
Yanzhu Lin ▴ 120
@yanzhu-lin-6551
Last seen 8.2 years ago
Hi Mike, I would like to try DESeq2 package now. My project has three factors: A with 16 levels, B with 2 levels and C with 3 levels, in total I have 16 x 3 x 2 = 96 groups, for each group, we have 8 biological replicates in our original experiment design, which ends up with 96 x 8 =768 biosamples. I can't understand why we can't estimate any interaction terms if there are 8 replicates per group, which was mentioned in your previous email. Based on 8 biological replicates per group, I think we still have enough df to test each interaction term. Best, Yanzhu Jan 15, 2014 at 11:12 AM, Michael Love <michaelisaiahlove@gmail.com> wrote: > hi Yanzhu, > > Firstly, we recommend in general moving to DESeq2, where we spend most of > our development effort and which will be supported going foward. > > The DESeq2 workflow is very similar to DESeq and is described in detail in > the vignette. The constructor function from a count matrix and column data > is DESeqDataSetFromMatrix(), see the manual page for the required > arguments. For likelihood ratio tests, you can use the following code: > > dds <- DESeqDataSetFromMatrix(counts, columndata, design = ~ A) > dds <- DESeq(dds, test="LRT", reduced = ~ 1) > res <- results(dds) > res > > In order to give more detailed recommendations though, can you please tell > us more about the experimental setup and what questions you are trying to > answer? For instance, how are the samples distributed across groups A, B > and C?* I would suppose they are not 8 replicates per group, as this > would not allow you to estimate any interaction terms.* > > Mike > > > On Mon, Jan 13, 2014 at 2:15 PM, Yanzhu [guest] <guest@bioconductor.org> > wrote: > >> >> Dear Community, >> >> I have some questions about how the DESeq r package works for >> multi-factors expersiment. My experiment has three factors: A/B/C, and 8 >> replicates per condition. I would like the test the significance of the >> main effects of factor A, B and C, the significance of the two-way >> interaction terms: A:B, A:C and B:C, and the significance of the three-way >> interaction term: A:B:C. I want the table of pvalue for each term (main >> effects, two-way interaction terms and the three-way interaction term) like >> what ANOVA does for each gene. >> >> >> I know to test the significance of the three-way interaction term, we use >> the following coding: >> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C+B:C+A:B:C) >> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C+B:C) >> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >> >> My Questions are: how can I test the significance of main effects and the >> two-way interaction terms? >> >> 1. To test the main effect of A, B and C >> >> (i) To test the main effect of A: >> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A) >> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~1) >> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >> >> (ii) To test the main effect of B: >> Do I need to use the following coding: >> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~B) >> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~1) >> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >> >> OR: >> >> Do I need to use the following coding: >> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B) >> itDeSeq0<-fitNbinomGLMs(cdsFull,count~B) >> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >> >> Which one is correct? >> >> (iii) To test the main effect of C: >> Do I need to use the following coding: >> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~C) >> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~1) >> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >> >> OR: >> >> Do I need to use the following coding: >> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C) >> itDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B) >> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >> >> which one is correct? >> >> 2. To test the two-way interaction terms: A:B, A:C and B:C >> >> (i) To test the two-way interaction term: A:B >> Do I need to use the following coding: >> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B) >> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C) >> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >> >> Is it correct? >> >> (ii) To test the two-way interaction term: A:C >> Do I need to use the following coding: >> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C) >> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B) >> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >> >> OR: >> >> Do I need to use the following coding: >> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:C) >> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C) >> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >> >> Which one is correct? >> >> (iii) To test the two-way interaction term: B:C >> Do I need to use the following coding: >> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C+B:C) >> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C) >> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >> >> OR: >> >> Do I need to use the following coding: >> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+B:C) >> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C) >> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >> >> Which one is correct? >> >> Thank you! >> >> >> >> >> >> >> -- output of sessionInfo(): >> >> sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United >> States.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] DESeq_1.12.1 lattice_0.20-15 locfit_1.5-9.1 >> Biobase_2.20.1 BiocGenerics_0.6.0 edgeR_3.2.4 >> [7] limma_3.16.8 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 >> genefilter_1.42.0 geneplotter_1.38.0 >> [6] grid_3.0.1 IRanges_1.18.4 MASS_7.3-26 >> RColorBrewer_1.0-5 RSQLite_0.11.4 >> [11] splines_3.0.1 stats4_3.0.1 survival_2.37-4 >> tools_3.0.1 XML_3.98-1.1 >> [16] xtable_1.7-1 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
DESeq DESeq2 DESeq DESeq2 • 1.9k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 11 hours ago
United States
hi Yanzhu, Apologies, I misread your email earlier, and thought you had 8 samples total. You're right that you have many degrees of freedom here. In response to your previous questions: To paraphrase: to test the main effect of A, what's the different between using a full model ~ A and reduced model ~ 1 vs. using a full model ~ A + B and reduced model ~ B? The difference is the the second model "accounts for" differences due to B while testing the effect of A. I would say that generally the second is preferred. You should consult a local statistician to explain this difference though, as this is a subtle one which is not easy to explain over email threads. The same is true for testing C: ~ C and ~ 1 vs. ~ A + B + C and ~ A + B. The second model "accounts for" differences due to A and B while testing the effect of C. The same is also true for your questions about the interaction terms, e.g. whether you should include the A:B term when testing A:C or not. These are subtle design choices which are best explained by consulting someone locally. I can tell you that using DESeq2 with test="LRT" is analogous to standard ANOVA for each gene, and so the same design choices apply as would apply to standard ANOVA. Mike On Tue, Aug 12, 2014 at 10:03 AM, Yanzhu Lin <mlinyzh at="" gmail.com=""> wrote: > Hi Mike, > > I would like to try DESeq2 package now. > > My project has three factors: A with 16 levels, B with 2 levels and C with 3 > levels, in total I have 16 x 3 x 2 = 96 groups, for each group, we have 8 > biological replicates in our original experiment design, which ends up with > 96 x 8 =768 biosamples. > > I can't understand why we can't estimate any interaction terms if there are > 8 replicates per group, which was mentioned in your previous email. > > Based on 8 biological replicates per group, I think we still have enough df > to test each interaction term. > > > > Best, > > > Yanzhu > > > > > > Jan 15, 2014 at 11:12 AM, Michael Love <michaelisaiahlove at="" gmail.com=""> wrote: >> >> hi Yanzhu, >> >> Firstly, we recommend in general moving to DESeq2, where we spend most of >> our development effort and which will be supported going foward. >> >> The DESeq2 workflow is very similar to DESeq and is described in detail in >> the vignette. The constructor function from a count matrix and column data >> is DESeqDataSetFromMatrix(), see the manual page for the required arguments. >> For likelihood ratio tests, you can use the following code: >> >> dds <- DESeqDataSetFromMatrix(counts, columndata, design = ~ A) >> dds <- DESeq(dds, test="LRT", reduced = ~ 1) >> res <- results(dds) >> res >> >> In order to give more detailed recommendations though, can you please tell >> us more about the experimental setup and what questions you are trying to >> answer? For instance, how are the samples distributed across groups A, B and >> C? I would suppose they are not 8 replicates per group, as this would not >> allow you to estimate any interaction terms. >> >> Mike >> >> >> On Mon, Jan 13, 2014 at 2:15 PM, Yanzhu [guest] <guest at="" bioconductor.org=""> >> wrote: >>> >>> >>> Dear Community, >>> >>> I have some questions about how the DESeq r package works for >>> multi-factors expersiment. My experiment has three factors: A/B/C, and 8 >>> replicates per condition. I would like the test the significance of the main >>> effects of factor A, B and C, the significance of the two-way interaction >>> terms: A:B, A:C and B:C, and the significance of the three-way interaction >>> term: A:B:C. I want the table of pvalue for each term (main effects, two-way >>> interaction terms and the three-way interaction term) like what ANOVA does >>> for each gene. >>> >>> >>> I know to test the significance of the three-way interaction term, we use >>> the following coding: >>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C+B:C+A:B:C) >>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C+B:C) >>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >>> >>> My Questions are: how can I test the significance of main effects and the >>> two-way interaction terms? >>> >>> 1. To test the main effect of A, B and C >>> >>> (i) To test the main effect of A: >>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A) >>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~1) >>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >>> >>> (ii) To test the main effect of B: >>> Do I need to use the following coding: >>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~B) >>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~1) >>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >>> >>> OR: >>> >>> Do I need to use the following coding: >>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B) >>> itDeSeq0<-fitNbinomGLMs(cdsFull,count~B) >>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >>> >>> Which one is correct? >>> >>> (iii) To test the main effect of C: >>> Do I need to use the following coding: >>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~C) >>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~1) >>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >>> >>> OR: >>> >>> Do I need to use the following coding: >>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C) >>> itDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B) >>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >>> >>> which one is correct? >>> >>> 2. To test the two-way interaction terms: A:B, A:C and B:C >>> >>> (i) To test the two-way interaction term: A:B >>> Do I need to use the following coding: >>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B) >>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C) >>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >>> >>> Is it correct? >>> >>> (ii) To test the two-way interaction term: A:C >>> Do I need to use the following coding: >>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C) >>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B) >>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >>> >>> OR: >>> >>> Do I need to use the following coding: >>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:C) >>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C) >>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >>> >>> Which one is correct? >>> >>> (iii) To test the two-way interaction term: B:C >>> Do I need to use the following coding: >>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C+B:C) >>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C+A:B+A:C) >>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >>> >>> OR: >>> >>> Do I need to use the following coding: >>> fitDeSeq1<-fitNbinomGLMs(cdsFull,count~A+B+C+B:C) >>> fitDeSeq0<-fitNbinomGLMs(cdsFull,count~A+B+C) >>> modDeSeq<-nbinomGLMTest(fitDeSeq1,fitDeSeq0) >>> >>> Which one is correct? >>> >>> Thank you! >>> >>> >>> >>> >>> >>> >>> -- output of sessionInfo(): >>> >>> sessionInfo() >>> R version 3.0.1 (2013-05-16) >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>> States.1252 LC_MONETARY=English_United States.1252 >>> [4] LC_NUMERIC=C LC_TIME=English_United >>> States.1252 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> base >>> >>> other attached packages: >>> [1] DESeq_1.12.1 lattice_0.20-15 locfit_1.5-9.1 >>> Biobase_2.20.1 BiocGenerics_0.6.0 edgeR_3.2.4 >>> [7] limma_3.16.8 >>> >>> loaded via a namespace (and not attached): >>> [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 >>> genefilter_1.42.0 geneplotter_1.38.0 >>> [6] grid_3.0.1 IRanges_1.18.4 MASS_7.3-26 >>> RColorBrewer_1.0-5 RSQLite_0.11.4 >>> [11] splines_3.0.1 stats4_3.0.1 survival_2.37-4 >>> tools_3.0.1 XML_3.98-1.1 >>> [16] xtable_1.7-1 >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >
ADD COMMENT

Login before adding your answer.

Traffic: 605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6