DESeq: Hypothesis testing in multifactor design

0

Entering edit mode

Yanzhu Lin ▴ 120

@yanzhu-lin-6551

Last seen 7.6 years ago

Hi Mike, I am using DESeq2 package for my project now, and I have some questions regarding to this pacakge. Please let me briefly introduce some background information about my project. I have three factors: A with 16 levels, B with 2 levels and C with 3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group, hence 96*8=768 biosamples in total. Due to some issues, we lost some replicates for some groups, which ends up 726 biosamples, hence it is unbalance design. Our purpose are to test the main effects of three factor: A, B and C, the two-way interaction: A:B, A:C and B:C, and the three-way interaction term: A:B:C. For example, I will compare full model: ~A+B+C with reduce model: ~A+B to test factor C, and so on for other two main effects. For testing three-way interaction A:B:C, I will compare full model: ~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my questions. I have different full models for testing main effects, two-way interaction and three-way interaction term. Will the dispersion estimation affected by my full model? Can I specify the full model when I use nbinomLRT()? In other words, can I use estimateDispersion() only once and fit nbinomLRT with different full models as below: dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design =~A+B+C+A:B+A:C+B:C+A:B:C) ### normalization dds=estimateSizeFactors(dds) ### dispersion estimation: dds=estimateDispersions(dds) ###Test three-way interaction term. dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C) ###Test main effect of factor A: dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C) ###Test main effect of factor B: dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C) ###Test main effect of factor C: dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B) Thanks, Yanzhu On Tue, Jun 10, 2014 at 4:07 PM, Michael Love <michaelisaiahlove at="" gmail.com=""> wrote: > hi Yanzhu, > > Note that we recommend users switch to using DESeq2, which also has > the likelihood ratio test you are using, and is faster and more > sensitive. > > The pipeline would look like: > > DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C) > > for your first example. > > For your question, the terms of the reduced model should be contained > within the full model. Still there are a number of models which > satisfy this requirement, e.g. for testing B:C, you could use > A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced respectively. > Or you could use A+B+C+B:C and A+B+C. The importance of these other > interaction terms depends on context, whether they are very > explanatory or not. > > Mike > > On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] <guest at="" bioconductor.org=""> > wrote: > > Dear Community, > > > > I have a question about the hypothesis testing of the two-way > interaction terms in a multifactor design which includes three factors: A, > B and C. > > > > When I tested the three-way interaction I used the full and reduced > models as below for nbinomGLMTest(): > > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C > > Reduced: count ~ A+B+C+A:B+A:C+B:C > > > > Now comes my question, when I want to test the effect of two-way > interaction terms, i.e., A:B, A:C or B:C, what should be my full and > reduced models? For example, when I want to the test the effect of A:B, > what should be my full and reduced models for nbinomGLMTest() using DESeq > pacakge? > > > > > > Best, > > > > > > > > Yanzhu > > > > > > -- output of sessionInfo(): > > > > sessionInfo() > > R version 3.1.0 (2014-04-10) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > locale: > > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > base > > > > other attached packages: > > [1] DESeq_1.16.0 lattice_0.20-29 locfit_1.5-9.1 > Biobase_2.24.0 BiocGenerics_0.10.0 edgeR_3.6.1 limma_3.20.1 > > > > loaded via a namespace (and not attached): > > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 > genefilter_1.46.0 geneplotter_1.42.0 GenomeInfoDb_1.0.2 > > [7] grid_3.1.0 IRanges_1.22.6 MASS_7.3-31 > RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0 > > [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0 > XML_3.98-1.1 xtable_1.7-3 > > > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

DESeq2 DESeq2 • 2.1k views

ADD COMMENT • link updated 9.7 years ago by Michael Love 41k • written 9.7 years ago by Yanzhu Lin ▴ 120

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 19 hours ago

United States

Hi Yanzhu, On Aug 20, 2014 3:59 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: > > Hi Mike, > > I am using DESeq2 package for my project now, and I have some questions regarding to this pacakge. > > Please let me briefly introduce some background information about my project. I have three factors: A with 16 levels, B with 2 levels and C with 3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group, hence 96*8=768 biosamples in total. Due to some issues, we lost some replicates for some groups, which ends up 726 biosamples, hence it is unbalance design. > > Our purpose are to test the main effects of three factor: A, B and C, the two-way interaction: A:B, A:C and B:C, and the three-way interaction term: A:B:C. For example, I will compare full model: ~A+B+C with reduce model: ~A+B to test factor C, and so on for other two main effects. For testing three-way interaction A:B:C, I will compare full model: ~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my questions. > > > I have different full models for testing main effects, two-way interaction and three-way interaction term. Will the dispersion estimation affected by my full model? Can I specify the full model when I use nbinomLRT()? No, you should not change the design in between, i.e. don't use a different design for dispersion estimation and the full model in the LRT. Mike > In other words, can I use estimateDispersion() only once and fit nbinomLRT with different full models as below: > > > dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design =~A+B+C+A:B+A:C+B:C+A:B:C) > > ### normalization > dds=estimateSizeFactors(dds) > > ### dispersion estimation: > dds=estimateDispersions(dds) > > ###Test three-way interaction term. > dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C) > ###Test main effect of factor A: > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C) > ###Test main effect of factor B: > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C) > > ###Test main effect of factor C: > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B) > > > Thanks, > > > Yanzhu > > > > > > On Tue, Jun 10, 2014 at 4:07 PM, Michael Love <michaelisaiahlove at="" gmail.com=""> wrote: >> >> hi Yanzhu, >> >> Note that we recommend users switch to using DESeq2, which also has >> the likelihood ratio test you are using, and is faster and more >> sensitive. >> >> The pipeline would look like: >> >> DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C) >> >> for your first example. >> >> For your question, the terms of the reduced model should be contained >> within the full model. Still there are a number of models which >> satisfy this requirement, e.g. for testing B:C, you could use >> A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced respectively. >> Or you could use A+B+C+B:C and A+B+C. The importance of these other >> interaction terms depends on context, whether they are very >> explanatory or not. >> >> Mike >> >> On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] <guest at="" bioconductor.org=""> wrote: >> > Dear Community, >> > >> > I have a question about the hypothesis testing of the two-way interaction terms in a multifactor design which includes three factors: A, B and C. >> > >> > When I tested the three-way interaction I used the full and reduced models as below for nbinomGLMTest(): >> > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C >> > Reduced: count ~ A+B+C+A:B+A:C+B:C >> > >> > Now comes my question, when I want to test the effect of two-way interaction terms, i.e., A:B, A:C or B:C, what should be my full and reduced models? For example, when I want to the test the effect of A:B, what should be my full and reduced models for nbinomGLMTest() using DESeq pacakge? >> > >> > >> > Best, >> > >> > >> > >> > Yanzhu >> > >> > >> > -- output of sessionInfo(): >> > >> > sessionInfo() >> > R version 3.1.0 (2014-04-10) >> > Platform: x86_64-w64-mingw32/x64 (64-bit) >> > >> > locale: >> > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >> > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >> > >> > attached base packages: >> > [1] parallel stats graphics grDevices utils datasets methods base >> > >> > other attached packages: >> > [1] DESeq_1.16.0 lattice_0.20-29 locfit_1.5-9.1 Biobase_2.24.0 BiocGenerics_0.10.0 edgeR_3.6.1 limma_3.20.1 >> > >> > loaded via a namespace (and not attached): >> > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 genefilter_1.46.0 geneplotter_1.42.0 GenomeInfoDb_1.0.2 >> > [7] grid_3.1.0 IRanges_1.22.6 MASS_7.3-31 RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0 >> > [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0 XML_3.98-1.1 xtable_1.7-3 >> > >> > >> > -- >> > Sent via the guest posting facility at bioconductor.org. >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]

ADD COMMENT • link 9.7 years ago Michael Love 41k

0

Entering edit mode

Hi Mike, So you mean I need to estimate the dispersions every time when I have a different full model for LRT? In other words, so I need to use estimateDispersions() once when I test the main effect, where the full model is ~A+B+C, and use estimateDispersion() once when I test the three-way interaction term, where the full model is ~A+B+C+A:B+A:C+B:C+A:B:C? One more question about estimateDispersion(), does it take a very long time to estimate the dispersions? I have 16649 features and 726 biosamples, and I run the estimateDispersion function around 9:30 am this morning, and it hasn't done yet. Any suggestion that I can speed up the estimateDispesions(). Your help will be greatly appreciate, thanks. Best, On Wed, Aug 20, 2014 at 12:44 PM, Michael Love <michaelisaiahlove at="" gmail.com=""> wrote: > Hi Yanzhu, > > On Aug 20, 2014 3:59 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: > > > > Hi Mike, > > > > I am using DESeq2 package for my project now, and I have some questions > regarding to this pacakge. > > > > Please let me briefly introduce some background information about my > project. I have three factors: A with 16 levels, B with 2 levels and C with > 3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group, > hence 96*8=768 biosamples in total. Due to some issues, we lost some > replicates for some groups, which ends up 726 biosamples, hence it is > unbalance design. > > > > Our purpose are to test the main effects of three factor: A, B and C, > the two-way interaction: A:B, A:C and B:C, and the three-way interaction > term: A:B:C. For example, I will compare full model: ~A+B+C with reduce > model: ~A+B to test factor C, and so on for other two main effects. For > testing three-way interaction A:B:C, I will compare full model: > ~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my > questions. > > > > > > I have different full models for testing main effects, two-way > interaction and three-way interaction term. Will the dispersion estimation > affected by my full model? Can I specify the full model when I use > nbinomLRT()? > > No, you should not change the design in between, i.e. don't use a > different design for dispersion estimation and the full model in the LRT. > > Mike > > > In other words, can I use estimateDispersion() only once and fit > nbinomLRT with different full models as below: > > > > > > > dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,desi gn=~A+B+C+A:B+A:C+B:C+A:B:C) > > > > ### normalization > > dds=estimateSizeFactors(dds) > > > > ### dispersion estimation: > > dds=estimateDispersions(dds) > > > > ###Test three-way interaction term. > > dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C) > > ###Test main effect of factor A: > > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C) > > ###Test main effect of factor B: > > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C) > > > > ###Test main effect of factor C: > > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B) > > > > > > Thanks, > > > > > > Yanzhu > > > > > > > > > > > > On Tue, Jun 10, 2014 at 4:07 PM, Michael Love < > michaelisaiahlove at gmail.com> wrote: > >> > >> hi Yanzhu, > >> > >> Note that we recommend users switch to using DESeq2, which also has > >> the likelihood ratio test you are using, and is faster and more > >> sensitive. > >> > >> The pipeline would look like: > >> > >> DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C) > >> > >> for your first example. > >> > >> For your question, the terms of the reduced model should be contained > >> within the full model. Still there are a number of models which > >> satisfy this requirement, e.g. for testing B:C, you could use > >> A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced respectively. > >> Or you could use A+B+C+B:C and A+B+C. The importance of these other > >> interaction terms depends on context, whether they are very > >> explanatory or not. > >> > >> Mike > >> > >> On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] < > guest at bioconductor.org> wrote: > >> > Dear Community, > >> > > >> > I have a question about the hypothesis testing of the two-way > interaction terms in a multifactor design which includes three factors: A, > B and C. > >> > > >> > When I tested the three-way interaction I used the full and reduced > models as below for nbinomGLMTest(): > >> > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C > >> > Reduced: count ~ A+B+C+A:B+A:C+B:C > >> > > >> > Now comes my question, when I want to test the effect of two- way > interaction terms, i.e., A:B, A:C or B:C, what should be my full and > reduced models? For example, when I want to the test the effect of A:B, > what should be my full and reduced models for nbinomGLMTest() using DESeq > pacakge? > >> > > >> > > >> > Best, > >> > > >> > > >> > > >> > Yanzhu > >> > > >> > > >> > -- output of sessionInfo(): > >> > > >> > sessionInfo() > >> > R version 3.1.0 (2014-04-10) > >> > Platform: x86_64-w64-mingw32/x64 (64-bit) > >> > > >> > locale: > >> > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > >> > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > >> > > >> > attached base packages: > >> > [1] parallel stats graphics grDevices utils datasets > methods base > >> > > >> > other attached packages: > >> > [1] DESeq_1.16.0 lattice_0.20-29 locfit_1.5-9.1 > Biobase_2.24.0 BiocGenerics_0.10.0 edgeR_3.6.1 limma_3.20.1 > >> > > >> > loaded via a namespace (and not attached): > >> > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 > genefilter_1.46.0 geneplotter_1.42.0 GenomeInfoDb_1.0.2 > >> > [7] grid_3.1.0 IRanges_1.22.6 MASS_7.3-31 > RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0 > >> > [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0 > XML_3.98-1.1 xtable_1.7-3 > >> > > >> > > >> > -- > >> > Sent via the guest posting facility at bioconductor.org. > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor at r-project.org > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago Yanzhu Lin ▴ 120

0

Entering edit mode

Hi Yanzhu, On Aug 20, 2014 7:39 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: > > Hi Mike, > > So you mean I need to estimate the dispersions every time when I have a different full model for LRT? Yes. > In other words, so I need to use estimateDispersions() once when I test the main effect, where the full model is ~A+B+C, and use estimateDispersion() once when I test the three-way interaction term, where the full model is ~A+B+C+A:B+A:C+B:C+A:B:C? > > > One more question about estimateDispersion(), does it take a very long time to estimate the dispersions? I have 16649 features and 726 biosamples, and I run the estimateDispersion function around 9:30 am this morning, and it hasn't done yet. Any suggestion that I can speed up the estimateDispesions(). > Yes, it takes time to estimate dispersion and to fit the GLM when you have so many samples and when you have ~100 coefficients to fit with all the interactions you are including. You could also try the voom transformation and linear modeling with limma Mike > Your help will be greatly appreciate, thanks. > > > Best, > > > > On Wed, Aug 20, 2014 at 12:44 PM, Michael Love < michaelisaiahlove at gmail.com> wrote: >> >> Hi Yanzhu, >> >> On Aug 20, 2014 3:59 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: >> > >> > Hi Mike, >> > >> > I am using DESeq2 package for my project now, and I have some questions regarding to this pacakge. >> > >> > Please let me briefly introduce some background information about my project. I have three factors: A with 16 levels, B with 2 levels and C with 3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group, hence 96*8=768 biosamples in total. Due to some issues, we lost some replicates for some groups, which ends up 726 biosamples, hence it is unbalance design. >> > >> > Our purpose are to test the main effects of three factor: A, B and C, the two-way interaction: A:B, A:C and B:C, and the three-way interaction term: A:B:C. For example, I will compare full model: ~A+B+C with reduce model: ~A+B to test factor C, and so on for other two main effects. For testing three-way interaction A:B:C, I will compare full model: ~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my questions. >> > >> > >> > I have different full models for testing main effects, two-way interaction and three-way interaction term. Will the dispersion estimation affected by my full model? Can I specify the full model when I use nbinomLRT()? >> >> No, you should not change the design in between, i.e. don't use a different design for dispersion estimation and the full model in the LRT. >> >> Mike >> >> > In other words, can I use estimateDispersion() only once and fit nbinomLRT with different full models as below: >> > >> > >> > dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design =~A+B+C+A:B+A:C+B:C+A:B:C) >> > >> > ### normalization >> > dds=estimateSizeFactors(dds) >> > >> > ### dispersion estimation: >> > dds=estimateDispersions(dds) >> > >> > ###Test three-way interaction term. >> > dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C) >> > ###Test main effect of factor A: >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C) >> > ###Test main effect of factor B: >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C) >> > >> > ###Test main effect of factor C: >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B) >> > >> > >> > Thanks, >> > >> > >> > Yanzhu >> > >> > >> > >> > >> > >> > On Tue, Jun 10, 2014 at 4:07 PM, Michael Love < michaelisaiahlove at gmail.com> wrote: >> >> >> >> hi Yanzhu, >> >> >> >> Note that we recommend users switch to using DESeq2, which also has >> >> the likelihood ratio test you are using, and is faster and more >> >> sensitive. >> >> >> >> The pipeline would look like: >> >> >> >> DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C) >> >> >> >> for your first example. >> >> >> >> For your question, the terms of the reduced model should be contained >> >> within the full model. Still there are a number of models which >> >> satisfy this requirement, e.g. for testing B:C, you could use >> >> A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced respectively. >> >> Or you could use A+B+C+B:C and A+B+C. The importance of these other >> >> interaction terms depends on context, whether they are very >> >> explanatory or not. >> >> >> >> Mike >> >> >> >> On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] < guest at bioconductor.org> wrote: >> >> > Dear Community, >> >> > >> >> > I have a question about the hypothesis testing of the two-way interaction terms in a multifactor design which includes three factors: A, B and C. >> >> > >> >> > When I tested the three-way interaction I used the full and reduced models as below for nbinomGLMTest(): >> >> > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C >> >> > Reduced: count ~ A+B+C+A:B+A:C+B:C >> >> > >> >> > Now comes my question, when I want to test the effect of two- way interaction terms, i.e., A:B, A:C or B:C, what should be my full and reduced models? For example, when I want to the test the effect of A:B, what should be my full and reduced models for nbinomGLMTest() using DESeq pacakge? >> >> > >> >> > >> >> > Best, >> >> > >> >> > >> >> > >> >> > Yanzhu >> >> > >> >> > >> >> > -- output of sessionInfo(): >> >> > >> >> > sessionInfo() >> >> > R version 3.1.0 (2014-04-10) >> >> > Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> > >> >> > locale: >> >> > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >> >> > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >> >> > >> >> > attached base packages: >> >> > [1] parallel stats graphics grDevices utils datasets methods base >> >> > >> >> > other attached packages: >> >> > [1] DESeq_1.16.0 lattice_0.20-29 locfit_1.5-9.1 Biobase_2.24.0 BiocGenerics_0.10.0 edgeR_3.6.1 limma_3.20.1 >> >> > >> >> > loaded via a namespace (and not attached): >> >> > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 genefilter_1.46.0 geneplotter_1.42.0 GenomeInfoDb_1.0.2 >> >> > [7] grid_3.1.0 IRanges_1.22.6 MASS_7.3-31 RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0 >> >> > [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0 XML_3.98-1.1 xtable_1.7-3 >> >> > >> >> > >> >> > -- >> >> > Sent via the guest posting facility at bioconductor.org. >> >> > >> >> > _______________________________________________ >> >> > Bioconductor mailing list >> >> > Bioconductor at r-project.org >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > > > [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago Michael Love 41k

0

Entering edit mode

Hi Mike, Thank you so much for your prompt reply. I really appreciate it. Best, Yanzhu On Wed, Aug 20, 2014 at 3:54 PM, Michael Love <michaelisaiahlove at="" gmail.com=""> wrote: > Hi Yanzhu, > > On Aug 20, 2014 7:39 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: > > > > Hi Mike, > > > > So you mean I need to estimate the dispersions every time when I have a > different full model for LRT? > > Yes. > > > In other words, so I need to use estimateDispersions() once when I test > the main effect, where the full model is ~A+B+C, and use > estimateDispersion() once when I test the three-way interaction term, where > the full model is ~A+B+C+A:B+A:C+B:C+A:B:C? > > > > > > One more question about estimateDispersion(), does it take a very long > time to estimate the dispersions? I have 16649 features and 726 biosamples, > and I run the estimateDispersion function around 9:30 am this morning, and > it hasn't done yet. Any suggestion that I can speed up the > estimateDispesions(). > > > > Yes, it takes time to estimate dispersion and to fit the GLM when you have > so many samples and when you have ~100 coefficients to fit with all the > interactions you are including. > > You could also try the voom transformation and linear modeling with limma > > Mike > > > Your help will be greatly appreciate, thanks. > > > > > > Best, > > > > > > > > On Wed, Aug 20, 2014 at 12:44 PM, Michael Love < > michaelisaiahlove at gmail.com> wrote: > >> > >> Hi Yanzhu, > >> > >> On Aug 20, 2014 3:59 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: > >> > > >> > Hi Mike, > >> > > >> > I am using DESeq2 package for my project now, and I have some > questions regarding to this pacakge. > >> > > >> > Please let me briefly introduce some background information about my > project. I have three factors: A with 16 levels, B with 2 levels and C with > 3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group, > hence 96*8=768 biosamples in total. Due to some issues, we lost some > replicates for some groups, which ends up 726 biosamples, hence it is > unbalance design. > >> > > >> > Our purpose are to test the main effects of three factor: A, B and C, > the two-way interaction: A:B, A:C and B:C, and the three-way interaction > term: A:B:C. For example, I will compare full model: ~A+B+C with reduce > model: ~A+B to test factor C, and so on for other two main effects. For > testing three-way interaction A:B:C, I will compare full model: > ~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my > questions. > >> > > >> > > >> > I have different full models for testing main effects, two-way > interaction and three-way interaction term. Will the dispersion estimation > affected by my full model? Can I specify the full model when I use > nbinomLRT()? > >> > >> No, you should not change the design in between, i.e. don't use a > different design for dispersion estimation and the full model in the LRT. > >> > >> Mike > >> > >> > In other words, can I use estimateDispersion() only once and fit > nbinomLRT with different full models as below: > >> > > >> > > >> > > dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,desi gn=~A+B+C+A:B+A:C+B:C+A:B:C) > >> > > >> > ### normalization > >> > dds=estimateSizeFactors(dds) > >> > > >> > ### dispersion estimation: > >> > dds=estimateDispersions(dds) > >> > > >> > ###Test three-way interaction term. > >> > dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C) > >> > ###Test main effect of factor A: > >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C) > >> > ###Test main effect of factor B: > >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C) > >> > > >> > ###Test main effect of factor C: > >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B) > >> > > >> > > >> > Thanks, > >> > > >> > > >> > Yanzhu > >> > > >> > > >> > > >> > > >> > > >> > On Tue, Jun 10, 2014 at 4:07 PM, Michael Love < > michaelisaiahlove at gmail.com> wrote: > >> >> > >> >> hi Yanzhu, > >> >> > >> >> Note that we recommend users switch to using DESeq2, which also has > >> >> the likelihood ratio test you are using, and is faster and more > >> >> sensitive. > >> >> > >> >> The pipeline would look like: > >> >> > >> >> DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C) > >> >> > >> >> for your first example. > >> >> > >> >> For your question, the terms of the reduced model should be contained > >> >> within the full model. Still there are a number of models which > >> >> satisfy this requirement, e.g. for testing B:C, you could use > >> >> A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced respectively. > >> >> Or you could use A+B+C+B:C and A+B+C. The importance of these other > >> >> interaction terms depends on context, whether they are very > >> >> explanatory or not. > >> >> > >> >> Mike > >> >> > >> >> On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] < > guest at bioconductor.org> wrote: > >> >> > Dear Community, > >> >> > > >> >> > I have a question about the hypothesis testing of the two- way > interaction terms in a multifactor design which includes three factors: A, > B and C. > >> >> > > >> >> > When I tested the three-way interaction I used the full and > reduced models as below for nbinomGLMTest(): > >> >> > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C > >> >> > Reduced: count ~ A+B+C+A:B+A:C+B:C > >> >> > > >> >> > Now comes my question, when I want to test the effect of two-way > interaction terms, i.e., A:B, A:C or B:C, what should be my full and > reduced models? For example, when I want to the test the effect of A:B, > what should be my full and reduced models for nbinomGLMTest() using DESeq > pacakge? > >> >> > > >> >> > > >> >> > Best, > >> >> > > >> >> > > >> >> > > >> >> > Yanzhu > >> >> > > >> >> > > >> >> > -- output of sessionInfo(): > >> >> > > >> >> > sessionInfo() > >> >> > R version 3.1.0 (2014-04-10) > >> >> > Platform: x86_64-w64-mingw32/x64 (64-bit) > >> >> > > >> >> > locale: > >> >> > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > >> >> > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > >> >> > > >> >> > attached base packages: > >> >> > [1] parallel stats graphics grDevices utils datasets > methods base > >> >> > > >> >> > other attached packages: > >> >> > [1] DESeq_1.16.0 lattice_0.20-29 locfit_1.5-9.1 > Biobase_2.24.0 BiocGenerics_0.10.0 edgeR_3.6.1 limma_3.20.1 > >> >> > > >> >> > loaded via a namespace (and not attached): > >> >> > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 > genefilter_1.46.0 geneplotter_1.42.0 GenomeInfoDb_1.0.2 > >> >> > [7] grid_3.1.0 IRanges_1.22.6 MASS_7.3-31 > RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0 > >> >> > [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0 > XML_3.98-1.1 xtable_1.7-3 > >> >> > > >> >> > > >> >> > -- > >> >> > Sent via the guest posting facility at bioconductor.org. > >> >> > > >> >> > _______________________________________________ > >> >> > Bioconductor mailing list > >> >> > Bioconductor at r-project.org > >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago Yanzhu Lin ▴ 120

0

Entering edit mode

Hi Mike, It took quite a long long time to estimate the dispersion using estimateDispersions(), and it also reported an error message as below: dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design =~A+B+C+A:B+A:C+B:C+A:B:C) dds=estimateSizeFactors(dds) dds=estimateDispersions(dds) *gene-wise dispersion estimatesError in simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, : NA/NaN/Inf in foreign function call (arg 1)* Do you have any suggestion about how to solve this problem? Thanks. Best, Yanzhu On Wed, Aug 20, 2014 at 3:54 PM, Michael Love <michaelisaiahlove at="" gmail.com=""> wrote: > Hi Yanzhu, > > On Aug 20, 2014 7:39 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: > > > > Hi Mike, > > > > So you mean I need to estimate the dispersions every time when I have a > different full model for LRT? > > Yes. > > > In other words, so I need to use estimateDispersions() once when I test > the main effect, where the full model is ~A+B+C, and use > estimateDispersion() once when I test the three-way interaction term, where > the full model is ~A+B+C+A:B+A:C+B:C+A:B:C? > > > > > > One more question about estimateDispersion(), does it take a very long > time to estimate the dispersions? I have 16649 features and 726 biosamples, > and I run the estimateDispersion function around 9:30 am this morning, and > it hasn't done yet. Any suggestion that I can speed up the > estimateDispesions(). > > > > Yes, it takes time to estimate dispersion and to fit the GLM when you have > so many samples and when you have ~100 coefficients to fit with all the > interactions you are including. > > You could also try the voom transformation and linear modeling with limma > > Mike > > > Your help will be greatly appreciate, thanks. > > > > > > Best, > > > > > > > > On Wed, Aug 20, 2014 at 12:44 PM, Michael Love < > michaelisaiahlove at gmail.com> wrote: > >> > >> Hi Yanzhu, > >> > >> On Aug 20, 2014 3:59 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: > >> > > >> > Hi Mike, > >> > > >> > I am using DESeq2 package for my project now, and I have some > questions regarding to this pacakge. > >> > > >> > Please let me briefly introduce some background information about my > project. I have three factors: A with 16 levels, B with 2 levels and C with > 3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group, > hence 96*8=768 biosamples in total. Due to some issues, we lost some > replicates for some groups, which ends up 726 biosamples, hence it is > unbalance design. > >> > > >> > Our purpose are to test the main effects of three factor: A, B and C, > the two-way interaction: A:B, A:C and B:C, and the three-way interaction > term: A:B:C. For example, I will compare full model: ~A+B+C with reduce > model: ~A+B to test factor C, and so on for other two main effects. For > testing three-way interaction A:B:C, I will compare full model: > ~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my > questions. > >> > > >> > > >> > I have different full models for testing main effects, two-way > interaction and three-way interaction term. Will the dispersion estimation > affected by my full model? Can I specify the full model when I use > nbinomLRT()? > >> > >> No, you should not change the design in between, i.e. don't use a > different design for dispersion estimation and the full model in the LRT. > >> > >> Mike > >> > >> > In other words, can I use estimateDispersion() only once and fit > nbinomLRT with different full models as below: > >> > > >> > > >> > > dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,desi gn=~A+B+C+A:B+A:C+B:C+A:B:C) > >> > > >> > ### normalization > >> > dds=estimateSizeFactors(dds) > >> > > >> > ### dispersion estimation: > >> > dds=estimateDispersions(dds) > >> > > >> > ###Test three-way interaction term. > >> > dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C) > >> > ###Test main effect of factor A: > >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C) > >> > ###Test main effect of factor B: > >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C) > >> > > >> > ###Test main effect of factor C: > >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B) > >> > > >> > > >> > Thanks, > >> > > >> > > >> > Yanzhu > >> > > >> > > >> > > >> > > >> > > >> > On Tue, Jun 10, 2014 at 4:07 PM, Michael Love < > michaelisaiahlove at gmail.com> wrote: > >> >> > >> >> hi Yanzhu, > >> >> > >> >> Note that we recommend users switch to using DESeq2, which also has > >> >> the likelihood ratio test you are using, and is faster and more > >> >> sensitive. > >> >> > >> >> The pipeline would look like: > >> >> > >> >> DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C) > >> >> > >> >> for your first example. > >> >> > >> >> For your question, the terms of the reduced model should be contained > >> >> within the full model. Still there are a number of models which > >> >> satisfy this requirement, e.g. for testing B:C, you could use > >> >> A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced respectively. > >> >> Or you could use A+B+C+B:C and A+B+C. The importance of these other > >> >> interaction terms depends on context, whether they are very > >> >> explanatory or not. > >> >> > >> >> Mike > >> >> > >> >> On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] < > guest at bioconductor.org> wrote: > >> >> > Dear Community, > >> >> > > >> >> > I have a question about the hypothesis testing of the two- way > interaction terms in a multifactor design which includes three factors: A, > B and C. > >> >> > > >> >> > When I tested the three-way interaction I used the full and > reduced models as below for nbinomGLMTest(): > >> >> > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C > >> >> > Reduced: count ~ A+B+C+A:B+A:C+B:C > >> >> > > >> >> > Now comes my question, when I want to test the effect of two-way > interaction terms, i.e., A:B, A:C or B:C, what should be my full and > reduced models? For example, when I want to the test the effect of A:B, > what should be my full and reduced models for nbinomGLMTest() using DESeq > pacakge? > >> >> > > >> >> > > >> >> > Best, > >> >> > > >> >> > > >> >> > > >> >> > Yanzhu > >> >> > > >> >> > > >> >> > -- output of sessionInfo(): > >> >> > > >> >> > sessionInfo() > >> >> > R version 3.1.0 (2014-04-10) > >> >> > Platform: x86_64-w64-mingw32/x64 (64-bit) > >> >> > > >> >> > locale: > >> >> > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > >> >> > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > >> >> > > >> >> > attached base packages: > >> >> > [1] parallel stats graphics grDevices utils datasets > methods base > >> >> > > >> >> > other attached packages: > >> >> > [1] DESeq_1.16.0 lattice_0.20-29 locfit_1.5-9.1 > Biobase_2.24.0 BiocGenerics_0.10.0 edgeR_3.6.1 limma_3.20.1 > >> >> > > >> >> > loaded via a namespace (and not attached): > >> >> > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 > genefilter_1.46.0 geneplotter_1.42.0 GenomeInfoDb_1.0.2 > >> >> > [7] grid_3.1.0 IRanges_1.22.6 MASS_7.3-31 > RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0 > >> >> > [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0 > XML_3.98-1.1 xtable_1.7-3 > >> >> > > >> >> > > >> >> > -- > >> >> > Sent via the guest posting facility at bioconductor.org. > >> >> > > >> >> > _______________________________________________ > >> >> > Bioconductor mailing list > >> >> > Bioconductor at r-project.org > >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago Yanzhu Lin ▴ 120

0

Entering edit mode

Hi Yanzhu, I have already restructured this part of the code in the development branch, which will be released in October. As I haven't seen this error before, i guess it has to do with the ~100 of coefficients and 700 samples, so I probably won't work on a fix for the release branch. You can either try the development branch of Bioconductor, or wait until the next release. Hope this helps, Mike On Aug 22, 2014 1:59 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: > Hi Mike, > > It took quite a long long time to estimate the dispersion using > estimateDispersions(), and it also reported an error message as below: > > > > dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,desi gn=~A+B+C+A:B+A:C+B:C+A:B:C) > dds=estimateSizeFactors(dds) > dds=estimateDispersions(dds) > > > > > *gene-wise dispersion estimatesError in simpleLoess(y, x, w, span, degree, > parametric, drop.square, normalize, : NA/NaN/Inf in foreign function > call (arg 1)* > Do you have any suggestion about how to solve this problem? Thanks. > > > Best, > > > Yanzhu > > On Wed, Aug 20, 2014 at 3:54 PM, Michael Love <michaelisaiahlove at="" gmail.com=""> > wrote: > >> Hi Yanzhu, >> >> On Aug 20, 2014 7:39 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: >> > >> > Hi Mike, >> > >> > So you mean I need to estimate the dispersions every time when I have a >> different full model for LRT? >> >> Yes. >> >> > In other words, so I need to use estimateDispersions() once when I test >> the main effect, where the full model is ~A+B+C, and use >> estimateDispersion() once when I test the three-way interaction term, where >> the full model is ~A+B+C+A:B+A:C+B:C+A:B:C? >> > >> > >> > One more question about estimateDispersion(), does it take a very long >> time to estimate the dispersions? I have 16649 features and 726 biosamples, >> and I run the estimateDispersion function around 9:30 am this morning, and >> it hasn't done yet. Any suggestion that I can speed up the >> estimateDispesions(). >> > >> >> Yes, it takes time to estimate dispersion and to fit the GLM when you >> have so many samples and when you have ~100 coefficients to fit with all >> the interactions you are including. >> >> You could also try the voom transformation and linear modeling with limma >> >> Mike >> >> > Your help will be greatly appreciate, thanks. >> > >> > >> > Best, >> > >> > >> > >> > On Wed, Aug 20, 2014 at 12:44 PM, Michael Love < >> michaelisaiahlove at gmail.com> wrote: >> >> >> >> Hi Yanzhu, >> >> >> >> On Aug 20, 2014 3:59 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: >> >> > >> >> > Hi Mike, >> >> > >> >> > I am using DESeq2 package for my project now, and I have some >> questions regarding to this pacakge. >> >> > >> >> > Please let me briefly introduce some background information about my >> project. I have three factors: A with 16 levels, B with 2 levels and C with >> 3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group, >> hence 96*8=768 biosamples in total. Due to some issues, we lost some >> replicates for some groups, which ends up 726 biosamples, hence it is >> unbalance design. >> >> > >> >> > Our purpose are to test the main effects of three factor: A, B and >> C, the two-way interaction: A:B, A:C and B:C, and the three-way interaction >> term: A:B:C. For example, I will compare full model: ~A+B+C with reduce >> model: ~A+B to test factor C, and so on for other two main effects. For >> testing three-way interaction A:B:C, I will compare full model: >> ~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my >> questions. >> >> > >> >> > >> >> > I have different full models for testing main effects, two-way >> interaction and three-way interaction term. Will the dispersion estimation >> affected by my full model? Can I specify the full model when I use >> nbinomLRT()? >> >> >> >> No, you should not change the design in between, i.e. don't use a >> different design for dispersion estimation and the full model in the LRT. >> >> >> >> Mike >> >> >> >> > In other words, can I use estimateDispersion() only once and fit >> nbinomLRT with different full models as below: >> >> > >> >> > >> >> > >> dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,des ign=~A+B+C+A:B+A:C+B:C+A:B:C) >> >> > >> >> > ### normalization >> >> > dds=estimateSizeFactors(dds) >> >> > >> >> > ### dispersion estimation: >> >> > dds=estimateDispersions(dds) >> >> > >> >> > ###Test three-way interaction term. >> >> > dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C) >> >> > ###Test main effect of factor A: >> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C) >> >> > ###Test main effect of factor B: >> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C) >> >> > >> >> > ###Test main effect of factor C: >> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B) >> >> > >> >> > >> >> > Thanks, >> >> > >> >> > >> >> > Yanzhu >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > On Tue, Jun 10, 2014 at 4:07 PM, Michael Love < >> michaelisaiahlove at gmail.com> wrote: >> >> >> >> >> >> hi Yanzhu, >> >> >> >> >> >> Note that we recommend users switch to using DESeq2, which also has >> >> >> the likelihood ratio test you are using, and is faster and more >> >> >> sensitive. >> >> >> >> >> >> The pipeline would look like: >> >> >> >> >> >> DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C) >> >> >> >> >> >> for your first example. >> >> >> >> >> >> For your question, the terms of the reduced model should be >> contained >> >> >> within the full model. Still there are a number of models which >> >> >> satisfy this requirement, e.g. for testing B:C, you could use >> >> >> A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced >> respectively. >> >> >> Or you could use A+B+C+B:C and A+B+C. The importance of these other >> >> >> interaction terms depends on context, whether they are very >> >> >> explanatory or not. >> >> >> >> >> >> Mike >> >> >> >> >> >> On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] < >> guest at bioconductor.org> wrote: >> >> >> > Dear Community, >> >> >> > >> >> >> > I have a question about the hypothesis testing of the two- way >> interaction terms in a multifactor design which includes three factors: A, >> B and C. >> >> >> > >> >> >> > When I tested the three-way interaction I used the full and >> reduced models as below for nbinomGLMTest(): >> >> >> > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C >> >> >> > Reduced: count ~ A+B+C+A:B+A:C+B:C >> >> >> > >> >> >> > Now comes my question, when I want to test the effect of two-way >> interaction terms, i.e., A:B, A:C or B:C, what should be my full and >> reduced models? For example, when I want to the test the effect of A:B, >> what should be my full and reduced models for nbinomGLMTest() using DESeq >> pacakge? >> >> >> > >> >> >> > >> >> >> > Best, >> >> >> > >> >> >> > >> >> >> > >> >> >> > Yanzhu >> >> >> > >> >> >> > >> >> >> > -- output of sessionInfo(): >> >> >> > >> >> >> > sessionInfo() >> >> >> > R version 3.1.0 (2014-04-10) >> >> >> > Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> >> > >> >> >> > locale: >> >> >> > [1] LC_COLLATE=English_United States.1252 >> LC_CTYPE=English_United States.1252 LC_MONETARY=English_United >> States.1252 >> >> >> > [4] LC_NUMERIC=C LC_TIME=English_United >> States.1252 >> >> >> > >> >> >> > attached base packages: >> >> >> > [1] parallel stats graphics grDevices utils datasets >> methods base >> >> >> > >> >> >> > other attached packages: >> >> >> > [1] DESeq_1.16.0 lattice_0.20-29 locfit_1.5-9.1 >> Biobase_2.24.0 BiocGenerics_0.10.0 edgeR_3.6.1 limma_3.20.1 >> >> >> > >> >> >> > loaded via a namespace (and not attached): >> >> >> > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 >> genefilter_1.46.0 geneplotter_1.42.0 GenomeInfoDb_1.0.2 >> >> >> > [7] grid_3.1.0 IRanges_1.22.6 MASS_7.3-31 >> RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0 >> >> >> > [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0 >> XML_3.98-1.1 xtable_1.7-3 >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Sent via the guest posting facility at bioconductor.org. >> >> >> > >> >> >> > _______________________________________________ >> >> >> > Bioconductor mailing list >> >> >> > Bioconductor at r-project.org >> >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >> >> > >> > >> > >> > > [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago Michael Love 41k

0

Entering edit mode

Also, as Michael previously suggested you might want to try kicking the tires on your data + analysis using limma::voom, especially for data of this size. Yes there are cases when one approach is better than the other, but here the cal to `voom` will be relatively quick (my bet is that it's *at most* 5 mins, but likely less than 1 ;-) and you can start analyzing your data quickly. Certainly try upgrading your R + bioconductor to use the latest DESeq2, if you like, but you can get your analysis going w/ limma::voom while DESeq2 is estimating its dispersions, you could then come around and see where the corner cases of your analysis are between the results of the voom analysis vs. deseq2. You could then report back here and tell us what you find as far what the pros and cons of each approach are after you compare the results, as these comparisons are always helpful/interesting for the greater community. HTH, -steve On Fri, Aug 22, 2014 at 7:23 AM, Michael Love <michaelisaiahlove at="" gmail.com=""> wrote: > Hi Yanzhu, > > I have already restructured this part of the code in the development > branch, which will be released in October. As I haven't seen this error > before, i guess it has to do with the ~100 of coefficients and 700 samples, > so I probably won't work on a fix for the release branch. > > You can either try the development branch of Bioconductor, or wait until > the next release. > > Hope this helps, > > Mike > On Aug 22, 2014 1:59 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: > >> Hi Mike, >> >> It took quite a long long time to estimate the dispersion using >> estimateDispersions(), and it also reported an error message as below: >> >> >> >> dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,des ign=~A+B+C+A:B+A:C+B:C+A:B:C) >> dds=estimateSizeFactors(dds) >> dds=estimateDispersions(dds) >> >> >> >> >> *gene-wise dispersion estimatesError in simpleLoess(y, x, w, span, degree, >> parametric, drop.square, normalize, : NA/NaN/Inf in foreign function >> call (arg 1)* >> Do you have any suggestion about how to solve this problem? Thanks. >> >> >> Best, >> >> >> Yanzhu >> >> On Wed, Aug 20, 2014 at 3:54 PM, Michael Love <michaelisaiahlove at="" gmail.com="">> > wrote: >> >>> Hi Yanzhu, >>> >>> On Aug 20, 2014 7:39 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: >>> > >>> > Hi Mike, >>> > >>> > So you mean I need to estimate the dispersions every time when I have a >>> different full model for LRT? >>> >>> Yes. >>> >>> > In other words, so I need to use estimateDispersions() once when I test >>> the main effect, where the full model is ~A+B+C, and use >>> estimateDispersion() once when I test the three-way interaction term, where >>> the full model is ~A+B+C+A:B+A:C+B:C+A:B:C? >>> > >>> > >>> > One more question about estimateDispersion(), does it take a very long >>> time to estimate the dispersions? I have 16649 features and 726 biosamples, >>> and I run the estimateDispersion function around 9:30 am this morning, and >>> it hasn't done yet. Any suggestion that I can speed up the >>> estimateDispesions(). >>> > >>> >>> Yes, it takes time to estimate dispersion and to fit the GLM when you >>> have so many samples and when you have ~100 coefficients to fit with all >>> the interactions you are including. >>> >>> You could also try the voom transformation and linear modeling with limma >>> >>> Mike >>> >>> > Your help will be greatly appreciate, thanks. >>> > >>> > >>> > Best, >>> > >>> > >>> > >>> > On Wed, Aug 20, 2014 at 12:44 PM, Michael Love < >>> michaelisaiahlove at gmail.com> wrote: >>> >> >>> >> Hi Yanzhu, >>> >> >>> >> On Aug 20, 2014 3:59 PM, "Yanzhu Lin" <mlinyzh at="" gmail.com=""> wrote: >>> >> > >>> >> > Hi Mike, >>> >> > >>> >> > I am using DESeq2 package for my project now, and I have some >>> questions regarding to this pacakge. >>> >> > >>> >> > Please let me briefly introduce some background information about my >>> project. I have three factors: A with 16 levels, B with 2 levels and C with >>> 3 levels, in total 16*2*3=96 groups. There are 8 biosamples for each group, >>> hence 96*8=768 biosamples in total. Due to some issues, we lost some >>> replicates for some groups, which ends up 726 biosamples, hence it is >>> unbalance design. >>> >> > >>> >> > Our purpose are to test the main effects of three factor: A, B and >>> C, the two-way interaction: A:B, A:C and B:C, and the three-way interaction >>> term: A:B:C. For example, I will compare full model: ~A+B+C with reduce >>> model: ~A+B to test factor C, and so on for other two main effects. For >>> testing three-way interaction A:B:C, I will compare full model: >>> ~A+B+C+A:B+A:C+B:C+A:B:C and reduced model ~A+B+C+A:B+A:C+B:C. Then come my >>> questions. >>> >> > >>> >> > >>> >> > I have different full models for testing main effects, two- way >>> interaction and three-way interaction term. Will the dispersion estimation >>> affected by my full model? Can I specify the full model when I use >>> nbinomLRT()? >>> >> >>> >> No, you should not change the design in between, i.e. don't use a >>> different design for dispersion estimation and the full model in the LRT. >>> >> >>> >> Mike >>> >> >>> >> > In other words, can I use estimateDispersion() only once and fit >>> nbinomLRT with different full models as below: >>> >> > >>> >> > >>> >> > >>> dds<-DESeqDataSetFromMatrix(countData=countdata,colData=coldata,de sign=~A+B+C+A:B+A:C+B:C+A:B:C) >>> >> > >>> >> > ### normalization >>> >> > dds=estimateSizeFactors(dds) >>> >> > >>> >> > ### dispersion estimation: >>> >> > dds=estimateDispersions(dds) >>> >> > >>> >> > ###Test three-way interaction term. >>> >> > dds<-nbinomLRT(dds,reduced=~A+B+C+A:B+A:C+B:C) >>> >> > ###Test main effect of factor A: >>> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~B+C) >>> >> > ###Test main effect of factor B: >>> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+C) >>> >> > >>> >> > ###Test main effect of factor C: >>> >> > dds<-nbinomLRT(dds,full=~A+B+C, reduced=~A+B) >>> >> > >>> >> > >>> >> > Thanks, >>> >> > >>> >> > >>> >> > Yanzhu >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > On Tue, Jun 10, 2014 at 4:07 PM, Michael Love < >>> michaelisaiahlove at gmail.com> wrote: >>> >> >> >>> >> >> hi Yanzhu, >>> >> >> >>> >> >> Note that we recommend users switch to using DESeq2, which also has >>> >> >> the likelihood ratio test you are using, and is faster and more >>> >> >> sensitive. >>> >> >> >>> >> >> The pipeline would look like: >>> >> >> >>> >> >> DESeq(dds, test="LRT", reduced=~ A+B+C+A:B+A:C+B:C) >>> >> >> >>> >> >> for your first example. >>> >> >> >>> >> >> For your question, the terms of the reduced model should be >>> contained >>> >> >> within the full model. Still there are a number of models which >>> >> >> satisfy this requirement, e.g. for testing B:C, you could use >>> >> >> A+B+C+A:B+A:C+B:C and A+B+C+A:B+A:C as full and reduced >>> respectively. >>> >> >> Or you could use A+B+C+B:C and A+B+C. The importance of these other >>> >> >> interaction terms depends on context, whether they are very >>> >> >> explanatory or not. >>> >> >> >>> >> >> Mike >>> >> >> >>> >> >> On Tue, Jun 10, 2014 at 11:21 AM, yanzhu [guest] < >>> guest at bioconductor.org> wrote: >>> >> >> > Dear Community, >>> >> >> > >>> >> >> > I have a question about the hypothesis testing of the two- way >>> interaction terms in a multifactor design which includes three factors: A, >>> B and C. >>> >> >> > >>> >> >> > When I tested the three-way interaction I used the full and >>> reduced models as below for nbinomGLMTest(): >>> >> >> > Full: count ~ A+B+C+A:B+A:C+B:C+A:B:C >>> >> >> > Reduced: count ~ A+B+C+A:B+A:C+B:C >>> >> >> > >>> >> >> > Now comes my question, when I want to test the effect of two-way >>> interaction terms, i.e., A:B, A:C or B:C, what should be my full and >>> reduced models? For example, when I want to the test the effect of A:B, >>> what should be my full and reduced models for nbinomGLMTest() using DESeq >>> pacakge? >>> >> >> > >>> >> >> > >>> >> >> > Best, >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > Yanzhu >>> >> >> > >>> >> >> > >>> >> >> > -- output of sessionInfo(): >>> >> >> > >>> >> >> > sessionInfo() >>> >> >> > R version 3.1.0 (2014-04-10) >>> >> >> > Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >> >> > >>> >> >> > locale: >>> >> >> > [1] LC_COLLATE=English_United States.1252 >>> LC_CTYPE=English_United States.1252 LC_MONETARY=English_United >>> States.1252 >>> >> >> > [4] LC_NUMERIC=C LC_TIME=English_United >>> States.1252 >>> >> >> > >>> >> >> > attached base packages: >>> >> >> > [1] parallel stats graphics grDevices utils datasets >>> methods base >>> >> >> > >>> >> >> > other attached packages: >>> >> >> > [1] DESeq_1.16.0 lattice_0.20-29 locfit_1.5-9.1 >>> Biobase_2.24.0 BiocGenerics_0.10.0 edgeR_3.6.1 limma_3.20.1 >>> >> >> > >>> >> >> > loaded via a namespace (and not attached): >>> >> >> > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 >>> genefilter_1.46.0 geneplotter_1.42.0 GenomeInfoDb_1.0.2 >>> >> >> > [7] grid_3.1.0 IRanges_1.22.6 MASS_7.3-31 >>> RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0 >>> >> >> > [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0 >>> XML_3.98-1.1 xtable_1.7-3 >>> >> >> > >>> >> >> > >>> >> >> > -- >>> >> >> > Sent via the guest posting facility at bioconductor.org. >>> >> >> > >>> >> >> > _______________________________________________ >>> >> >> > Bioconductor mailing list >>> >> >> > Bioconductor at r-project.org >>> >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >> >> > Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > >>> >> > >>> > >>> > >>> >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Computational Biologist Genentech

ADD REPLY • link 9.7 years ago Steve Lianoglou ★ 13k

Login before adding your answer.