Multifactor analysis with deseq

0

Entering edit mode

Michael Muratet ▴ 420

@michael-muratet-3076

Last seen 11.4 years ago

Dear Dr Anders I have a few questions about applying deseq to a multifactor analysis that I am doing. I have three factors: three levels of genotype, three levels of "cell culture" and two time points. I have three replicates in each cell. The comparison(s) of most interest are changes in expression with genotype and days. I am also interested in knowing the effects of cell culture, but it's a technical issue. I have used HTSeq-count with the ENSEMBL v54 H.sapiens gtf to get counts from reads aligned with tophat. I have 54 libraries total. The first few rows of the conditions file look like: day culture genotype SL27450 7 33D6p63 NT SL27451 7 33D6p63 NT SL27452 7 33D6p63 NT SL27456 14 33D6p63 NT SL27457 14 33D6p63 NT SL27458 14 33D6p63 NT such that each set of factors is repeated three times. After normalization and dispersion estimation I did: fit0 <- fitNbinomGLMs(cds, count~genotype+day+culture) fit1 <- fitNbinomGLMs(cds, count~day+culture) then pvalsGLM = nbinomGLMTest( fit1, fit0 ) The result was entirely NaN or NAs. I'll put some output and the environment at the end of this email. I'm not sure where to start looking for the problem. At the moment, I am wondering how best to implement the pairwise comparison alluded to in the "fix me" in the manual I have. I am wondering: Does the formula in the fitNbinomGLMs() function work the same way as lm(), e.g., do I have to explicitly include interactions or will it calculate them for me? Is there a way to use relevel() to set the reference level? I have a wild-type genotype that would be natural for comparing two mutant genotypes. I don't see any references on the list, but has anybody tried to use normalized values from a CountDataSet outside of deseq? Lastly, I saw on the list while researching this issue that deseq2 is out. I'll give it try. Here is str() output from the fits (which seems OK to me) > str(fit0) 'data.frame': 37435 obs. of 8 variables: $ (Intercept) : num 9.01 1.8 7.69 7.06 6.63 ... $ genotypeNT : num -0.0666 0.0223 -0.08 -0.1554 -0.1967 ... $ genotypeWT : num -0.1665 -0.6294 -0.0716 -0.0908 -0.0754 ... $ day7 : num -0.0849 -0.4425 -0.0596 -0.1809 -0.2374 ... $ culture33D6p63: num -0.153 -1.917 0.119 0.715 0.518 ... $ culture34D6p62: num -0.2925 -0.3422 0.0181 0.242 0.0233 ... $ deviance : num 46.1 44.6 51.8 21.1 60.2 ... $ converged : logi TRUE TRUE TRUE TRUE TRUE TRUE ... - attr(*, "df.residual")= num 48 > str(fit1) 'data.frame': 37435 obs. of 6 variables: $ (Intercept) : num 8.94 1.6 7.64 6.99 6.54 ... $ day7 : num -0.0826 -0.4417 -0.0589 -0.1793 -0.2361 ... $ culture33D6p63: num -0.163 -1.899 0.114 0.708 0.507 ... $ culture34D6p62: num -0.3033 -0.3009 0.0161 0.2411 0.0191 ... $ deviance : num 49.7 46.9 52.7 23.9 64.1 ... $ converged : logi TRUE TRUE TRUE TRUE TRUE TRUE ... - attr(*, "df.residual")= num 50 > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf-8 LC_NUMERIC=C [3] LC_TIME=en_US.utf-8 LC_COLLATE=en_US.utf-8 [5] LC_MONETARY=en_US.utf-8 LC_MESSAGES=en_US.utf-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid splines stats graphics grDevices utils datasets [8] methods base other attached packages: [1] gplots_2.11.0 MASS_7.3-23 KernSmooth_2.23-9 caTools_1.14 [5] gdata_2.12.0 gtools_2.7.0 RColorBrewer_1.0-5 multcomp_1.2-17 [9] survival_2.37-4 mvtnorm_0.9-9994 heatmap.plus_1.3 DESeq_1.10.1 [13] lattice_0.20-6 locfit_1.5-8 Biobase_2.18.0 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] annotate_1.36.0 AnnotationDbi_1.20.6 bitops_1.0-5 [4] DBI_0.2-5 genefilter_1.40.0 geneplotter_1.36.0 [7] IRanges_1.16.6 parallel_2.15.0 RSQLite_0.11.2 [10] stats4_2.15.0 tools_2.15.0 XML_3.95-0.2 [13] xtable_1.7-1 Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806

Normalization DESeq DESeq2 Normalization DESeq DESeq2 • 2.9k views

ADD COMMENT • link updated 12.9 years ago by Simon Anders ★ 3.8k • written 12.9 years ago by Michael Muratet ▴ 420

0

Entering edit mode

Simon Anders ★ 3.8k

@simon-anders-3855

Last seen 5.5 years ago

Zentrum für Molekularbiologie, Universi…

Hi Michel On 22/03/13 20:52, Michael Muratet wrote: > fit0 <- fitNbinomGLMs(cds, count~genotype+day+culture) > fit1 <- fitNbinomGLMs(cds, count~day+culture) > then > pvalsGLM = nbinomGLMTest( fit1, fit0 ) > > The result was entirely NaN or NAs. This is easy. :-) It is simply because you supplied the fits in the wrong oder. The fit with 'genotype' is the full model and should come as first argument to nbinomGLMTest, and the reduced model without 'genotype' is second. So, just swap fit0 and fit1. > Does the formula in the fitNbinomGLMs() function work the same way as > lm(), e.g., do I have to explicitly include interactions or will it > calculate them for me? Yes, both fitNbinomGLMs and lm internally call model.matrix, which, as you know, adds an intercept automatically > Is there a way to use relevel() to set the reference level? I have a > wild-type genotype that would be natural for comparing two mutant > genotypes. Yes, you can use 'relevel' in the same way as you would with 'lm'. And to answer your question about the 'fixme'. There are two possibilities: You can compare a single genotype with wildtype by simply subsetting the CountDataSet to exclude the samples for the othe rgenotype before doing the fits. Or you can use our new DESeq2, which now offers a Wald test instead of a likelihood ratio test and can so give you p values for both genotype coefficient in one go. > I don't see any references on the list, but has anybody tried to use > normalized values from a CountDataSet outside of deseq? Depends on what you want to do with them. > Lastly, I saw on the list while researching this issue that deseq2 is > out. I'll give it try. Sure, do so. And tell us what you think. Simon

ADD COMMENT • link 12.9 years ago Simon Anders ★ 3.8k

0

Entering edit mode

On 22/03/13 21:06, Simon Anders wrote: >> Does the formula in the fitNbinomGLMs() function work the same way as >> lm(), e.g., do I have to explicitly include interactions or will it >> calculate them for me? > > Yes, both fitNbinomGLMs and lm internally call model.matrix, which, as > you know, adds an intercept automatically Typed too quickly. No, interactions have to be added explicitly (with '*' instead of '+'). To test for interaction effects, compare a full model with main effects and interaction (count ~ culture + genotype * day ) against one with only main effects (count ~ culture + genotype + day ). Simon

ADD REPLY • link 12.9 years ago Simon Anders ★ 3.8k

0

Entering edit mode

Dear List, I am trying to generate an html report for my DEXSeq output, but only for certain genes of interest. I keep getting an error that I do not understand! ======== Code: > ecs ExonCountSet (storageMode: environment) assayData: 622936 features, 3 samples element names: counts protocolData: none phenoData sampleNames: 230 231 232 varLabels: sizeFactor Name ... countfiles (9 total) varMetadata: labelDescription featureData featureNames: ENSG00000000003:E001 ENSG00000000003:E002 ... ENSG00000267801:E001 (622936 total) fvarLabels: geneID exonID ... log2fold(Mut/Cont) (14 total) fvarMetadata: labelDescription experimentData: use 'experimentData(object)' Annotation: > head(goi) Genes Gene.ID Chrom 1 ABCB7 ENSG00000131269 X 2 FTMT ENSG00000181867 5 3 ALAS2 ENSG00000158578 X 4 TP53 ENSG00000141510 17 5 P21 (CDKN1A) ENSG00000124762 6 6 PUMA (BBC3) ENSG00000105327 19 > dim(goi) 18 3 > DEXSeqHTML(ecs,geneIDs=goi$Gene.ID,color=c("red","blue"),file="goi_M C.html") Error in DEXSeqHTML(ecs, geneIDs = goi$Gene.ID, color = c("red", "blue"), : The geneIDs provided are not in the ecs object > DEXSeqHTML(ecs,geneIDs=goi$Gene.ID,color=c("red","blue"),file="goi_M C.html",FDR=1) Error in DEXSeqHTML(ecs, geneIDs = goi$Gene.ID, color = c("red", "blue"), : The geneIDs provided are not in the ecs object ======== I can't figure out what the problem is? To me it appears that all the ids in the goi object are present in the ecs object > grep(goi$Gene.ID[1],fData(ecs)$geneID) Any help much appreciated. Many Thanks, Natasha sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] WriteXLS_2.3.0 gdata_2.12.0 DEXSeq_1.4.0 Biobase_2.18.0 [5] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] biomaRt_2.14.0 gtools_2.7.0 hwriter_1.3 plyr_1.7.1 RCurl_1.95-3 [6] statmod_1.4.16 stringr_0.6.1 tools_2.15.2 XML_3.95-0.1

ADD REPLY • link 12.9 years ago Natasha ▴ 440

0

Entering edit mode

Dear Natasha, Thanks for your report. Sounds strange, just to make sure. What is the result of the code below? gns <- as.character(goi$Gene.ID) all( gns %in% levels( geneIDs( ecs ) ) ) Bests, Alejandro > Dear List, > > I am trying to generate an html report for my DEXSeq output, but only for certain genes of interest. > I keep getting an error that I do not understand! > ======== > Code: >> ecs > ExonCountSet (storageMode: environment) > assayData: 622936 features, 3 samples > element names: counts > protocolData: none > phenoData > sampleNames: 230 231 232 > varLabels: sizeFactor Name ... countfiles (9 total) > varMetadata: labelDescription > featureData > featureNames: ENSG00000000003:E001 ENSG00000000003:E002 ... > ENSG00000267801:E001 (622936 total) > fvarLabels: geneID exonID ... log2fold(Mut/Cont) (14 total) > fvarMetadata: labelDescription > experimentData: use 'experimentData(object)' > Annotation: > >> head(goi) > Genes Gene.ID Chrom > 1 ABCB7 ENSG00000131269 X > 2 FTMT ENSG00000181867 5 > 3 ALAS2 ENSG00000158578 X > 4 TP53 ENSG00000141510 17 > 5 P21 (CDKN1A) ENSG00000124762 6 > 6 PUMA (BBC3) ENSG00000105327 19 > >> dim(goi) > 18 3 > >> DEXSeqHTML(ecs,geneIDs=goi$Gene.ID,color=c("red","blue"),file="goi_ MC.html") > Error in DEXSeqHTML(ecs, geneIDs = goi$Gene.ID, color = c("red", "blue"), : > The geneIDs provided are not in the ecs object > >> DEXSeqHTML(ecs,geneIDs=goi$Gene.ID,color=c("red","blue"),file="goi_ MC.html",FDR=1) > Error in DEXSeqHTML(ecs, geneIDs = goi$Gene.ID, color = c("red", "blue"), : > The geneIDs provided are not in the ecs object > ======== > > I can't figure out what the problem is? To me it appears that all the ids in the goi object are present in the ecs object > >> grep(goi$Gene.ID[1],fData(ecs)$geneID) > Any help much appreciated. > > > Many Thanks, > Natasha > > sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] WriteXLS_2.3.0 gdata_2.12.0 DEXSeq_1.4.0 Biobase_2.18.0 > [5] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.14.0 gtools_2.7.0 hwriter_1.3 plyr_1.7.1 RCurl_1.95-3 > [6] statmod_1.4.16 stringr_0.6.1 tools_2.15.2 XML_3.95-0.1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.9 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

Simon Anders ★ 3.8k

@simon-anders-3855

Last seen 5.5 years ago

Zentrum für Molekularbiologie, Universi…

Hi [Please keep conversations on the list.] On 25/03/13 19:30, Michael Muratet wrote: > I thought I would use them in R to manually calculate the GLM where I > could have access to the model. Or, maybe there is a way to get that > in deseq2? What kind of "access" do you need? You can specify a formula (or even a model matrix, if you prefer), and you get all coefficients ande deviances back. Of course, you can call glm.fit or statmod::glmnb.fit directly. Then, however, you should _not_ use the normalized counts but rather specify the log size factors as offsets. > I tried to install deseq2 and discovered I need R 2.16. CRAN only > goes up to 15.3, but they have beta r003, is this what you used? No, I used R 2.15. See the explanations on the Bioc web page on how R and Bioc versions are tied together, and how devel packages are handled. Simon

ADD COMMENT • link 12.9 years ago Simon Anders ★ 3.8k

0

Entering edit mode

On Mar 25, 2013, at 1:36 PM, Simon Anders wrote: > Hi > > [Please keep conversations on the list.] Sorry, upgraded my OS X and didn't realize there is now a separate button for 'reply all'. Progress. > >> I tried to install deseq2 and discovered I need R 2.16. CRAN only >> goes up to 15.3, but they have beta r003, is this what you used? > > No, I used R 2.15. See the explanations on the Bioc web page on how R and Bioc versions are tied together, and how devel packages are handled. > Found the BioC devel version install web page. It said install R.003, so I did and now have DESeq2. I notice that the user manual dated yesterday has over 10K pages of output starting in section 3. Maybe it was intentional, but I thought I would bring it to your attention. Thanks Mike Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806

ADD REPLY • link 12.9 years ago Michael Muratet ▴ 420

0

Entering edit mode

Thanks for the heads up Michael. I will look into it. best, Mike On Mon, Mar 25, 2013 at 10:31 PM, Michael Muratet <mmuratet@hudsonalpha.org>wrote: > > On Mar 25, 2013, at 1:36 PM, Simon Anders wrote: > > > Hi > > > > [Please keep conversations on the list.] > Sorry, upgraded my OS X and didn't realize there is now a separate button > for 'reply all'. Progress. > > > >> I tried to install deseq2 and discovered I need R 2.16. CRAN only > >> goes up to 15.3, but they have beta r003, is this what you used? > > > > No, I used R 2.15. See the explanations on the Bioc web page on how R > and Bioc versions are tied together, and how devel packages are handled. > > > > Found the BioC devel version install web page. It said install R.003, so I > did and now have DESeq2. I notice that the user manual dated yesterday has > over 10K pages of output starting in section 3. Maybe it was intentional, > but I thought I would bring it to your attention. > > Thanks > > Mike > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet@hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 12.9 years ago Michael Love 43k

0

Entering edit mode

I fixed this bug in version 0.99.23. (I was filling a NumericMatrix by row which didn't like the upgrade from Rcpp 0.10.2 to 0.10.3.) Mike On Mon, Mar 25, 2013 at 11:00 PM, Michael Love <michaelisaiahlove@gmail.com>wrote: > Thanks for the heads up Michael. I will look into it. > > best, > > Mike > > > On Mon, Mar 25, 2013 at 10:31 PM, Michael Muratet < > mmuratet@hudsonalpha.org> wrote: > >> >> On Mar 25, 2013, at 1:36 PM, Simon Anders wrote: >> >> > Hi >> > >> > [Please keep conversations on the list.] >> Sorry, upgraded my OS X and didn't realize there is now a separate button >> for 'reply all'. Progress. >> > >> >> I tried to install deseq2 and discovered I need R 2.16. CRAN only >> >> goes up to 15.3, but they have beta r003, is this what you used? >> > >> > No, I used R 2.15. See the explanations on the Bioc web page on how R >> and Bioc versions are tied together, and how devel packages are handled. >> > >> >> Found the BioC devel version install web page. It said install R.003, so >> I did and now have DESeq2. I notice that the user manual dated yesterday >> has over 10K pages of output starting in section 3. Maybe it was >> intentional, but I thought I would bring it to your attention. >> >> Thanks >> >> Mike >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmuratet@hudsonalpha.org >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 12.9 years ago Michael Love 43k

0

Entering edit mode

Howdy I just ran DESeq(des) for the first time and got thousands of lines of output, which explains the length of the vignette. Looking for the parameter to turn it off.... ;-) Cheers Mike Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806

ADD REPLY • link 12.9 years ago Michael Muratet ▴ 420

0

Entering edit mode

Hi Mike This is the fix I mentioned in my previous email. I pushed version 0.99.23 last night and it takes ~1 day to appear. Mike On Mar 26, 2013 5:51 PM, "Michael Muratet" <mmuratet@hudsonalpha.org> wrote: > Howdy > > I just ran DESeq(des) for the first time and got thousands of lines of > output, which explains the length of the vignette. Looking for the > parameter to turn it off.... ;-) > > Cheers > > Mike > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet@hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 12.9 years ago Michael Love 43k

Login before adding your answer.