Interpreting DESeq2 results

0

Entering edit mode

Michael Muratet ▴ 420

@michael-muratet-3076

Last seen 11.3 years ago

On Mar 28, 2013, at 4:19 PM, Michael Love wrote: > Hi Michael, > > The baseMean column is not on the log scale; it is the mean of normalized counts for a gene. The intercept from the GLM is labelled intercept in mcols(dse). > > Mike > Hello again Here's a snippet of output for the "Intercept" term. > head(res.mm9[["Intercept"]]) DataFrame with 6 rows and 4 columns baseMean log2FoldChange pvalue FDR <numeric> <numeric> <numeric> <numeric> ENSMUSG00000000001 4160.27257 12.107650 0.000000e+00 0.000000e+00 ENSMUSG00000000028 127.54781 7.001754 0.000000e+00 0.000000e+00 and here's a snippet for two level factor > head(res.mm9[["day14"]]) DataFrame with 6 rows and 4 columns baseMean log2FoldChange pvalue FDR <numeric> <numeric> <numeric> <numeric> ENSMUSG00000000001 4160.27257 -0.06449054 4.578027e-02 1.042132e-01 ENSMUSG00000000028 127.54781 -0.05709357 3.020473e-01 4.500798e-01 I'm still unclear about how to write down the coefficients for the model. The link function is log2(mean), correct? So is the "log2FoldChange" the particular value of beta for that coefficient? Would I write something like y_tilde(ENSMUSG00000000001) = 12.11 - 0.06449 + other terms? Thanks Mike > On Mar 28, 2013 5:00 PM, "Michael Muratet" <mmuratet at="" hudsonalpha.org=""> wrote: > Greetings > > I have an experiment: > > > design(dse) > ~ factor1 + factor2 + factor3 > > where factor1 has two levels, factor2 has three levels and factor3 has three levels. I extract a gene of interest from the results for each term (I've changed the indices to reflect the condition): > > > lapply(resultsNames(dse),function(u) results(dse,u)["gene_A",]) > [["Intercept"]] > baseMean log2FoldChange pvalue FDR > gene_A 1596.548 10.77485 3.309439e-216 7.025442e-216 > [["factor1_level2"]] > baseMean log2FoldChange pvalue FDR > gene_A 1596.548 0.3386776 0.1307309 0.3587438 > [["factor2_level2"]] > baseMean log2FoldChange pvalue FDR > gene_A 1596.548 -0.6882543 0.0613569 0.1007896 > [["factor2_level3"]] > baseMean log2FoldChange pvalue FDR > gene_A 1596.548 0.2393368 0.513216 0.6589575 > [["factor3_level2"]] > baseMean log2FoldChange pvalue FDR > gene_A 1596.548 0.1584153 0.6423634 0.8503163 > [["factor3_level3]] > baseMean log2FoldChange pvalue FDR > gene_A 1596.548 -1.627898 1.823141e-06 0.001409384 > > I want to be sure I understand the output format. Is it true that the coefficients (the vector beta) from the fit are the baseMean value scaled by the log2FoldChange? Is the true intercept value 1596.548*2^10.77485=2797274.13? > > mcols() tells me that the baseMean term is calculated over "all rows". The baseMean is different for different genes although it is the same for each gene across all the conditions, I'm not seeing how the rows are selected. > > Thanks > > Mike > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806

• 2.8k views

ADD COMMENT • link updated 12.7 years ago by Michael Love 43k • written 12.7 years ago by Michael Muratet ▴ 420

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

Hi Michael, Yes, log2FoldChange is the value of the coefficient/beta. The model and coefficients are described in Section B in the vignette. log2(mu) = beta_intercept + beta_day14 * x_day14 + ... log2(mu) = 12.11 - 0.06449 * x_1 + ... But as I suggested earlier, it is easier to access all the coefficients for the model through mcols(res.mm9). You should see columns "Intercept" and "day14" Mike On Wed, Apr 10, 2013 at 9:25 PM, Michael Muratet <mmuratet@hudsonalpha.org>wrote: > > On Mar 28, 2013, at 4:19 PM, Michael Love wrote: > > > Hi Michael, > > > > The baseMean column is not on the log scale; it is the mean of > normalized counts for a gene. The intercept from the GLM is labelled > intercept in mcols(dse). > > > > Mike > > > Hello again > > Here's a snippet of output for the "Intercept" term. > > > head(res.mm9[["Intercept"]]) > DataFrame with 6 rows and 4 columns > baseMean log2FoldChange pvalue FDR > <numeric> <numeric> <numeric> <numeric> > ENSMUSG00000000001 4160.27257 12.107650 0.000000e+00 0.000000e+00 > ENSMUSG00000000028 127.54781 7.001754 0.000000e+00 0.000000e+00 > > and here's a snippet for two level factor > > > head(res.mm9[["day14"]]) > DataFrame with 6 rows and 4 columns > baseMean log2FoldChange pvalue FDR > <numeric> <numeric> <numeric> <numeric> > ENSMUSG00000000001 4160.27257 -0.06449054 4.578027e-02 1.042132e-01 > ENSMUSG00000000028 127.54781 -0.05709357 3.020473e-01 4.500798e-01 > > I'm still unclear about how to write down the coefficients for the model. > The link function is log2(mean), correct? So is the "log2FoldChange" the > particular value of beta for that coefficient? > > Would I write something like > > y_tilde(ENSMUSG00000000001) = 12.11 - 0.06449 + other terms? > > Thanks > > Mike > > > On Mar 28, 2013 5:00 PM, "Michael Muratet" <mmuratet@hudsonalpha.org> > wrote: > > Greetings > > > > I have an experiment: > > > > > design(dse) > > ~ factor1 + factor2 + factor3 > > > > where factor1 has two levels, factor2 has three levels and factor3 has > three levels. I extract a gene of interest from the results for each term > (I've changed the indices to reflect the condition): > > > > > lapply(resultsNames(dse),function(u) results(dse,u)["gene_A",]) > > [["Intercept"]] > > baseMean log2FoldChange pvalue FDR > > gene_A 1596.548 10.77485 3.309439e-216 7.025442e-216 > > [["factor1_level2"]] > > baseMean log2FoldChange pvalue FDR > > gene_A 1596.548 0.3386776 0.1307309 0.3587438 > > [["factor2_level2"]] > > baseMean log2FoldChange pvalue FDR > > gene_A 1596.548 -0.6882543 0.0613569 0.1007896 > > [["factor2_level3"]] > > baseMean log2FoldChange pvalue FDR > > gene_A 1596.548 0.2393368 0.513216 0.6589575 > > [["factor3_level2"]] > > baseMean log2FoldChange pvalue FDR > > gene_A 1596.548 0.1584153 0.6423634 0.8503163 > > [["factor3_level3]] > > baseMean log2FoldChange pvalue FDR > > gene_A 1596.548 -1.627898 1.823141e-06 0.001409384 > > > > I want to be sure I understand the output format. Is it true that the > coefficients (the vector beta) from the fit are the baseMean value scaled > by the log2FoldChange? Is the true intercept value > 1596.548*2^10.77485=2797274.13? > > > > mcols() tells me that the baseMean term is calculated over "all rows". > The baseMean is different for different genes although it is the same for > each gene across all the conditions, I'm not seeing how the rows are > selected. > > > > Thanks > > > > Mike > > > > Michael Muratet, Ph.D. > > Senior Scientist > > HudsonAlpha Institute for Biotechnology > > mmuratet@hudsonalpha.org > > (256) 327-0473 (p) > > (256) 327-0966 (f) > > > > Room 4005 > > 601 Genome Way > > Huntsville, Alabama 35806 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet@hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > > > [[alternative HTML version deleted]]

ADD COMMENT • link 12.7 years ago Michael Love 43k

0

Entering edit mode

On Apr 10, 2013, at 2:49 PM, Michael Love wrote: > Hi Michael, > > Yes, log2FoldChange is the value of the coefficient/beta. The model and coefficients are described in Section B in the vignette. > > log2(mu) = beta_intercept + beta_day14 * x_day14 + ... > log2(mu) = 12.11 - 0.06449 * x_1 + ... > > But as I suggested earlier, it is easier to access all the coefficients for the model through mcols(res.mm9). You should see columns "Intercept" and "day14" > > Mike Hi Mike Well, I think I'm getting this sorted out. You have to apply mcols() to a DESeqSummaryExperiment object to get the coefficients. Application to results(dse) only gets you the names and for the one condition. Thanks Mike > > > On Wed, Apr 10, 2013 at 9:25 PM, Michael Muratet <mmuratet at="" hudsonalpha.org=""> wrote: > > On Mar 28, 2013, at 4:19 PM, Michael Love wrote: > > > Hi Michael, > > > > The baseMean column is not on the log scale; it is the mean of normalized counts for a gene. The intercept from the GLM is labelled intercept in mcols(dse). > > > > Mike > > > Hello again > > Here's a snippet of output for the "Intercept" term. > > > head(res.mm9[["Intercept"]]) > DataFrame with 6 rows and 4 columns > baseMean log2FoldChange pvalue FDR > <numeric> <numeric> <numeric> <numeric> > ENSMUSG00000000001 4160.27257 12.107650 0.000000e+00 0.000000e+00 > ENSMUSG00000000028 127.54781 7.001754 0.000000e+00 0.000000e+00 > > and here's a snippet for two level factor > > > head(res.mm9[["day14"]]) > DataFrame with 6 rows and 4 columns > baseMean log2FoldChange pvalue FDR > <numeric> <numeric> <numeric> <numeric> > ENSMUSG00000000001 4160.27257 -0.06449054 4.578027e-02 1.042132e-01 > ENSMUSG00000000028 127.54781 -0.05709357 3.020473e-01 4.500798e-01 > > I'm still unclear about how to write down the coefficients for the model. The link function is log2(mean), correct? So is the "log2FoldChange" the particular value of beta for that coefficient? > > Would I write something like > > y_tilde(ENSMUSG00000000001) = 12.11 - 0.06449 + other terms? > > Thanks > > Mike > > > On Mar 28, 2013 5:00 PM, "Michael Muratet" <mmuratet at="" hudsonalpha.org=""> wrote: > > Greetings > > > > I have an experiment: > > > > > design(dse) > > ~ factor1 + factor2 + factor3 > > > > where factor1 has two levels, factor2 has three levels and factor3 has three levels. I extract a gene of interest from the results for each term (I've changed the indices to reflect the condition): > > > > > lapply(resultsNames(dse),function(u) results(dse,u)["gene_A",]) > > [["Intercept"]] > > baseMean log2FoldChange pvalue FDR > > gene_A 1596.548 10.77485 3.309439e-216 7.025442e-216 > > [["factor1_level2"]] > > baseMean log2FoldChange pvalue FDR > > gene_A 1596.548 0.3386776 0.1307309 0.3587438 > > [["factor2_level2"]] > > baseMean log2FoldChange pvalue FDR > > gene_A 1596.548 -0.6882543 0.0613569 0.1007896 > > [["factor2_level3"]] > > baseMean log2FoldChange pvalue FDR > > gene_A 1596.548 0.2393368 0.513216 0.6589575 > > [["factor3_level2"]] > > baseMean log2FoldChange pvalue FDR > > gene_A 1596.548 0.1584153 0.6423634 0.8503163 > > [["factor3_level3]] > > baseMean log2FoldChange pvalue FDR > > gene_A 1596.548 -1.627898 1.823141e-06 0.001409384 > > > > I want to be sure I understand the output format. Is it true that the coefficients (the vector beta) from the fit are the baseMean value scaled by the log2FoldChange? Is the true intercept value 1596.548*2^10.77485=2797274.13? > > > > mcols() tells me that the baseMean term is calculated over "all rows". The baseMean is different for different genes although it is the same for each gene across all the conditions, I'm not seeing how the rows are selected. > > > > Thanks > > > > Mike > > > > Michael Muratet, Ph.D. > > Senior Scientist > > HudsonAlpha Institute for Biotechnology > > mmuratet at hudsonalpha.org > > (256) 327-0473 (p) > > (256) 327-0966 (f) > > > > Room 4005 > > 601 Genome Way > > Huntsville, Alabama 35806 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > > > Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806

ADD REPLY • link 12.7 years ago Michael Muratet ▴ 420

0

Entering edit mode

Hi guys, I tried to run the Filter by Genomic Region example on VariantAnnotation package's filterVcf vignette page 3, but I kept getting the error message: > library(VariantAnnotation) > file.gz <- "somaticVcfBeta-HCC1187-H-200-37-ASM-T1-N1.vcf.gz" > stopifnot(file.exists(file.gz)) > file.gz.tbi <- paste(file.gz, ".tbi", sep="") > file.gz.tbi [1] "somaticVcfBeta-HCC1187-H-200-37-ASM-T1-N1.vcf.gz.tbi" > file.exists(file.gz.tbi) [1] FALSE > if(!(file.exists(file.gz.tbi)) + indexTabix(file.gz, format="vcf") Error: unexpected symbol in: "if(!(file.exists(file.gz.tbi)) indexTabix" What did I do wrong? Thanks a lot for the help! Ying > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] VariantAnnotation_1.6.1 Rsamtools_1.12.0 Biostrings_2.28.0 [4] GenomicRanges_1.12.1 IRanges_1.18.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.22.1 Biobase_2.20.0 biomaRt_2.16.0 [4] bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-5 [7] GenomicFeatures_1.12.0 RCurl_1.95-4.1 RSQLite_0.11.2 [10] rtracklayer_1.20.0 stats4_3.0.0 tools_3.0.0 [13] XML_3.96-1.1 zlibbioc_1.6.0 > [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago ying chen ▴ 340

0

Entering edit mode

On 04/11/2013 11:06 AM, ying chen wrote: > > > Hi guys, > > > > I tried to run > the Filter by Genomic Region example on VariantAnnotation package's filterVcf > vignette page 3, but I kept getting the error message: > > > >> > library(VariantAnnotation) > >> file.gz <- "somaticVcfBeta-HCC1187-H-200-37-ASM-T1-N1.vcf.gz" > >> stopifnot(file.exists(file.gz)) > >> file.gz.tbi <- paste(file.gz, ".tbi", sep="") > >> > file.gz.tbi > > [1] > "somaticVcfBeta-HCC1187-H-200-37-ASM-T1-N1.vcf.gz.tbi" > >> > file.exists(file.gz.tbi) > > [1] FALSE > >> if(!(file.exists(file.gz.tbi)) > > + > indexTabix(file.gz, format="vcf") > > Error: unexpected symbol in: > > "if(!(file.exists(file.gz.tbi)) > > indexTabix" > > > > What did I do > wrong? A simple typo in the vignette > "if(!(file.exists(file.gz.tbi)) ^^^^^ if(!file.exists(file.gz.tbi)) Martin > > > > Thanks a lot > for the help! > > > > Ying > > > >> sessionInfo() > > R version 3.0.0 (2013-04-03) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > locale: > > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 > > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] parallel > stats graphics grDevices utils datasets > methods base > > > > other attached packages: > > [1] VariantAnnotation_1.6.1 Rsamtools_1.12.0 Biostrings_2.28.0 > > [4] GenomicRanges_1.12.1 > IRanges_1.18.0 > BiocGenerics_0.6.0 > > > > loaded via a namespace (and not attached): > > [1] > AnnotationDbi_1.22.1 > Biobase_2.20.0 > biomaRt_2.16.0 > > [4] bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-5 > > [7] > GenomicFeatures_1.12.0 RCurl_1.95-4.1 > RSQLite_0.11.2 > > [10] rtracklayer_1.20.0 > stats4_3.0.0 tools_3.0.0 > > [13] XML_3.96-1.1 > zlibbioc_1.6.0 > >> > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD REPLY • link 12.7 years ago Martin Morgan 25k

Login before adding your answer.