Question: multiple level factors contrast in nbinomLRT, DESeq2.
1
4.8 years ago by
shao80
Germany
shao80 wrote:
Dear list, I am not sure the correct way to interpret the nbinomLRT results in multiple level factors condition. Here is a toy example. I would to find DE genes after controlling batch effect in the experiments, in which there are multiple levels. ## make data library(DESeq2)dds <- makeExampleDESeqDataSet(m = 18)colData <- data.frame(row.names = rownames(colData(dds)), sample = colData(dds)$sample, condition = rep(LETTERS[1:6], each = 3), batch = factor(rep(c(1, 1, 2), 6)))dds <- DESeqDataSetFromMatrix(counts(dds), colData = colData, design = ~ batch + condition)dds <- estimateSizeFactors(dds)dds <- estimateDispersions(dds)## LRTestdds <- nbinomLRT(dds, reduced = formula(~ batch ))resultsNames(dds)[1] "Intercept" "batch_2_vs_1" "condition_B_vs_A" "condition_C_vs_A" "condition_D_vs_A" "condition_E_vs_A" "condition_F_vs_A"res.1 <- results(dds, name = "condition_B_vs_A")res.2 <- results(dds, contrast = list("condition_B_vs_A", "batch_2_vs_1") Questions:1.) I would like to get the DE genes between B and A, while controlling for the batch effect. Should I take "res.1" or "res.2", or both are wrong? and what is the correct way to do it? 2.) Why "res.3 <- results(dds, contrast = c("condition", "A", "b"))" gave error: "Error in normalizeSingleBracketSubscript(j, x) : subscript contains invalid names" 3.) In order to get DE genes between conditions not directly listed in the "resultsNames", is the following codes correct? should "batch_2_vs_1" be included int the contrast?##for example, DE between C and F, controlling for batch effect;res.4 <- results(dds, contrast = list("condition_C_vs_A", "condition_F_vs_A") Best, Chunxuan [[alternative HTML version deleted]] • 3.0k views ADD COMMENTlink modified 4.8 years ago by Michael Love24k • written 4.8 years ago by shao80 Answer: multiple level factors contrast in nbinomLRT, DESeq2. 0 4.8 years ago by shao80 Germany shao80 wrote: Dear list, I am not sure the correct way to interpret the nbinomLRT results in multiple level factors condition. Here is a toy example. I would to find DE genes after controlling batch effect in the experiments, in which there are multiple levels. ## make data library(DESeq2) dds <- makeExampleDESeqDataSet(m = 18) colData <- data.frame(row.names = rownames(colData(dds)), sample = colData(dds)$sample, condition = rep(LETTERS[1:6], each = 3), batch = factor(rep(c(1, 1, 2), 6))) dds <- DESeqDataSetFromMatrix(counts(dds), colData = colData, design = ~ batch + condition) dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) ## LRTest dds <- nbinomLRT(dds, reduced = formula(~ batch )) resultsNames(dds) [1] "Intercept" "batch_2_vs_1" "condition_B_vs_A" "condition_C_vs_A" "condition_D_vs_A" "condition_E_vs_A" "condition_F_vs_A" res.1 <- results(dds, name = "condition_B_vs_A") res.2 <- results(dds, contrast = list("condition_B_vs_A", "batch_2_vs_1") Questions: 1.) I would like to get the DE genes between B and A, while controlling for the batch effect. Should I take "res.1" or "res.2", or both are wrong? and what is the correct way to do it? 2.) Why "res.3 <- results(dds, contrast = c("condition", "A", "b"))" gave error: "Error in normalizeSingleBracketSubscript(j, x) : subscript contains invalid names" 3.) In order to get DE genes between conditions not directly listed in the "resultsNames", is the following codes correct? should "batch_2_vs_1" be included int the contrast?##for example, DE between C and F, controlling for batch effect;res.4 <- results(dds, contrast = list("condition_C_vs_A", "condition_F_vs_A") > sessionInfo()R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel graphics grDevices utils datasets stats grid methods base other attached packages: [1] DESeq2_1.4.5 RcppArmadillo_0.4.320.0 Rcpp_0.11.2 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.10 BiocGenerics_0.10.0 [8] magrittr_1.0.1 dplyr_0.2 RColorBrewer_1.0-5 reshape2_1.4 ggplot2_1.0.0 loaded via a namespace (and not attached): [1] annotate_1.42.1 AnnotationDbi_1.26.0 assertthat_0.1 Biobase_2.24.0 colorspace_1.2-4 DBI_0.2-7 digest_0.6.4 genefilter_1.46.1 [9] geneplotter_1.42.0 gtable_0.1.2 lattice_0.20-29 locfit_1.5-9.1 MASS_7.3-33 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 [17] R.methodsS3_1.6.1 R.oo_1.18.0 RSQLite_0.11.4 scales_0.2.4 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 [25] tools_3.1.0 XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0 Best, Chunxuan PS, sorry for the previous one, I forgot the session, and have no idea why the format is a mess. [[alternative HTML version deleted]]
Hi On 20/08/14 12:43, sh. chunxuan wrote: > I am not sure the correct way to interpret the nbinomLRT results in > multiple level factors condition. Is there a reason why you decided to use 'nbinomLRT' rather than 'nbinomWaldTest', which is suggested in the vignette for standard use cases? > Here is a toy example. I would to find DE genes after controlling > batch effect in the experiments, in which there are multiple levels. In your case, the LRT allows you to find genes which are affected by the experimental condition in _some_ way, i.e., for which you can reject the null hypothesis "This gene is expressed at the same level in _all_ six conditions; i.e., it is not affected by the experimental condition at all." However, you want to perform specific contrasts: > 1.) I would like to get the DE genes between B and A, while > controlling for the batch effect. Should I take "res.1" or "res.2", or > both are wrong? and what is the correct way to do it? This is much easier with a Wald test. You just run 'nbinomWaldTest' and then ask for results( dds, contrast = c( "condition", "B", "A" ) ) Simon
Answer: multiple level factors contrast in nbinomLRT, DESeq2.
0
4.8 years ago by
Michael Love24k
United States
Michael Love24k wrote:
Hi Chunxuan, On Aug 20, 2014 12:38 PM, "sh. chunxuan" <hibergo at="" outlook.com=""> wrote: > > > > > Dear list, > I am not sure the correct way to interpret the nbinomLRT results in multiple level factors condition. > Here is a toy example. I would to find DE genes after controlling batch effect in the experiments, in which there are multiple levels. > ## make data library(DESeq2)dds <- makeExampleDESeqDataSet(m = 18)colData <- data.frame(row.names = rownames(colData(dds)), sample = colData(dds)$sample, condition = rep(LETTERS[1:6], each = 3), batch = factor(rep(c(1, 1, 2), 6)))dds <- DESeqDataSetFromMatrix(counts(dds), colData = colData, design = ~ batch + condition)dds <- estimateSizeFactors(dds)dds <- estimateDispersions(dds)## LRTestdds <- nbinomLRT(dds, reduced = formula(~ batch ))resultsNames(dds)[1] "Intercept" "batch_2_vs_1" "condition_B_vs_A" "condition_C_vs_A" "condition_D_vs_A" "condition_E_vs_A" "condition_F_vs_A"res.1 <- results(dds, name = "condition_B_vs_A")res.2 <- results(dds, contrast = list("condition_B_vs_A", "batch_2_vs_1") > Questions:1.) I would like to get the DE genes between B and A, while controlling for the batch effect. You should use the default Wald test for such a comparison, not the LRT. See the vignette for an explanation of the difference between the two tests. So simply use DESeq() and then use: results(dds, contrast = c("condition", "B", "A")) > Should I take "res.1" or "res.2", or both are wrong? and what is the correct way to do it? > 2.) Why "res.3 <- results(dds, contrast = c("condition", "A", "b"))" gave error: "Error in normalizeSingleBracketSubscript(j, x) : subscript contains invalid names" This code gives an error because lowercase "b" is not a level. (It should be giving a more understandable error message, not sure why it is not doing so here.) > 3.) In order to get DE genes between conditions not directly listed in the "resultsNames", is the following codes correct? should "batch_2_vs_1" be included int the contrast?##for example, DE between C and F, controlling for batch effect;res.4 <- results(dds, contrast = list("condition_C_vs_A", "condition_F_vs_A") Check the help for ?results and the examples there. You should simply use: contrast = c("condition", "C", "F") Mike > Best, Chunxuan > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]] ADD COMMENTlink written 4.8 years ago by Michael Love24k Hi Mike, Simon, Thanks very much for the quick answers. I moved from DESeq, in which like hood ratio test is used for model comparison, so the the idea just sticked to my mind. Later I saw on the other vignette there is a description of the similar situation specifying "which means that we want to test for the effect of treatment (the last factor), controlling for the effect of patients (the first factor)". Thanks again and best, Chunxuan Date: Wed, 20 Aug 2014 14:43:18 +0200 Subject: Re: [BioC] multiple level factors contrast in nbinomLRT, DESeq2. From: michaelisaiahlove@gmail.com To: hibergo at outlook.com CC: bioconductor at r-project.org Hi Chunxuan, On Aug 20, 2014 12:38 PM, "sh. chunxuan" <hibergo at="" outlook.com=""> wrote: > > > > > Dear list, > I am not sure the correct way to interpret the nbinomLRT results in multiple level factors condition. > Here is a toy example. I would to find DE genes after controlling batch effect in the experiments, in which there are multiple levels. > ## make data library(DESeq2)dds <- makeExampleDESeqDataSet(m = 18)colData <- data.frame(row.names = rownames(colData(dds)), sample = colData(dds)$sample, condition = rep(LETTERS[1:6], each = 3), batch = factor(rep(c(1, 1, 2), 6)))dds <- DESeqDataSetFromMatrix(counts(dds), colData = colData, design = ~ batch + condition)dds <- estimateSizeFactors(dds)dds <- estimateDispersions(dds)## LRTestdds <- nbinomLRT(dds, reduced = formula(~ batch ))resultsNames(dds)[1] "Intercept" "batch_2_vs_1" "condition_B_vs_A" "condition_C_vs_A" "condition_D_vs_A" "condition_E_vs_A" "condition_F_vs_A"res.1 <- results(dds, name = "condition_B_vs_A")res.2 <- results(dds, contrast = list("condition_B_vs_A", "batch_2_vs_1") > Questions:1.) I would like to get the DE genes between B and A, while controlling for the batch effect. You should use the default Wald test for such a comparison, not the LRT. See the vignette for an explanation of the difference between the two tests. So simply use DESeq() and then use: results(dds, contrast = c("condition", "B", "A")) > Should I take "res.1" or "res.2", or both are wrong? and what is the correct way to do it? > 2.) Why "res.3 <- results(dds, contrast = c("condition", "A", "b"))" gave error: "Error in normalizeSingleBracketSubscript(j, x) : subscript contains invalid names" This code gives an error because lowercase "b" is not a level. (It should be giving a more understandable error message, not sure why it is not doing so here.) > 3.) In order to get DE genes between conditions not directly listed in the "resultsNames", is the following codes correct? should "batch_2_vs_1" be included int the contrast?##for example, DE between C and F, controlling for batch effect;res.4 <- results(dds, contrast = list("condition_C_vs_A", "condition_F_vs_A") Check the help for ?results and the examples there. You should simply use: contrast = c("condition", "C", "F") Mike > Best, Chunxuan > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]