Question: DESeq2 - input data questions
0
4.9 years ago by
Ming Yi350
Ming Yi350 wrote:
rnaseq deseq deseq2 edger • 3.0k views
modified 4.9 years ago by Michael Love21k • written 4.9 years ago by Ming Yi350
Answer: DESeq2 - input data questions
0
4.9 years ago by
Michael Love21k
United States
Michael Love21k wrote:
Hi, On Tue, Feb 11, 2014 at 7:11 AM, Ming Yi <yi02 at="" hotmail.com=""> wrote: > Hi, Mike: > > > > Thanks a lot for the prompt response and input, which is very helpful > > Since some of the genes seem a bit interesting to us, and of course we love to keep. > > However, when I try: > >> resType <- results(dds, "Type_Tumor_vs_Normal",cooksCutoff=FALSE); > Error in results(dds, "Type_Tumor_vs_Normal", cooksCutoff = FALSE) : > unused argument (cooksCutoff = FALSE) What version of DESeq2 are you using? Is the cooksCutoff not defined in the documentation when you fire?results ? -steve -- Steve Lianoglou Computational Biologist Genentech
Yes, Steve is on it. I assumed you were using the current release version of Bioconductor (2.13) with DESeq2 v1.2. In v1.0, the cooksCutoff argument was in DESeq(). > Also from your experience, if Cook's filtering is taken out, validation > rate much worse? In reality, some genes might have large variation than > others such as cancer-related genes. What do you think? âCook's filtering is just a heuristic, so it's hard to give general advice. The point is to help identify cases when individual samples have too much influence on the log fold changes. I would recommend plotting the counts of genes with large Cook's distance: # get the genes with highest max(Cook's distance for each sample) cooks <- mcols(dds)$maxCooks idx <- order(-cooks) # plot the normalized counts for the top gene by max Cook's distance plot( counts(dds,normalized=TRUE)[ idx[1], ], main=paste("Max Cook's:", cooks[idx[1]]) ) You can decide for yourself where to set the filter by setting cooksCutoff = x. Note that large variance alone will not lead to filtering; the filtering comes in when the variance for a majority of samples is small, but a minority of samples have extreme counts which have large influence on the log fold changes. For your experiment, if the subject variable is explaining a lot of the variance I would make sure to include it in the design, to help isolate the true condition effect. Mike On Tue, Feb 11, 2014 at 10:38 AM, Steve Lianoglou <lianoglou.steve@gene.com>wrote: > Hi, > > On Tue, Feb 11, 2014 at 7:11 AM, Ming Yi <yi02@hotmail.com> wrote: > > Hi, Mike: > > > > > > > > Thanks a lot for the prompt response and input, which is very helpful > > > > Since some of the genes seem a bit interesting to us, and of course we > love to keep. > > > > However, when I try: > > > >> resType <- results(dds, "Type_Tumor_vs_Normal",cooksCutoff=FALSE); > > Error in results(dds, "Type_Tumor_vs_Normal", cooksCutoff = FALSE) : > > unused argument (cooksCutoff = FALSE) > > What version of DESeq2 are you using? > Is the cooksCutoff not defined in the documentation when you > fire?results ? > > -steve > > -- > Steve Lianoglou > Computational Biologist > Genentech > [[alternative HTML version deleted]] ADD REPLYlink written 4.9 years ago by Michael Love21k Hi, Mike and Steve: Thx a lot for advice. below is my seesioninfo. my version DESeq2_1.0.19 is OK? > show(sessionInfo()); R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DESeq2_1.0.19 RcppArmadillo_0.4.000.2 Rcpp_0.11.0 [4] lattice_0.20-24 Biobase_2.22.0 GenomicRanges_1.12.5 [7] IRanges_1.18.4 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 [4] genefilter_1.42.0 grid_3.0.1 locfit_1.5-9.1 [7] RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.1 [10] stats4_3.0.1 survival_2.37-7 XML_3.98-1.1 [13] xtable_1.7-1 > From: michaelisaiahlove@gmail.com Date: Tue, 11 Feb 2014 11:02:00 -0500 Subject: Re: [BioC] DESeq2 - input data questions To: lianoglou.steve@gene.com CC: yi02@hotmail.com; bioconductor@r-project.org Yes, Steve is on it. I assumed you were using the current release version of Bioconductor (2.13) with DESeq2 v1.2. In v1.0, the cooksCutoff argument was in DESeq(). Also from your experience, if Cook's filtering is taken out, validation rate much worse? In reality, some genes might have large variation than others such as cancer-related genes. What do you think? âCook's filtering is just a heuristic, so it's hard to give general advice. The point is to help identify cases when individual samples have too much influence on the log fold changes. I would recommend plotting the counts of genes with large Cook's distance: # get the genes with highest max(Cook's distance for each sample) cooks <- mcols(dds)$maxCooks idx <- order(-cooks) # plot the normalized counts for the top gene by max Cook's distance plot( counts(dds,normalized=TRUE)[ idx[1], ], main=paste("Max Cook's:", cooks[idx[1]]) ) You can decide for yourself where to set the filter by setting cooksCutoff = x. Note that large variance alone will not lead to filtering; the filtering comes in when the variance for a majority of samples is small, but a minority of samples have extreme counts which have large influence on the log fold changes. For your experiment, if the subject variable is explaining a lot of the variance I would make sure to include it in the design, to help isolate the true condition effect. Mike On Tue, Feb 11, 2014 at 10:38 AM, Steve Lianoglou <lianoglou.steve@gene.com> wrote: Hi, On Tue, Feb 11, 2014 at 7:11 AM, Ming Yi <yi02@hotmail.com> wrote: > Hi, Mike: > > > > Thanks a lot for the prompt response and input, which is very helpful > > Since some of the genes seem a bit interesting to us, and of course we love to keep. > > However, when I try: > >> resType <- results(dds, "Type_Tumor_vs_Normal",cooksCutoff=FALSE); > Error in results(dds, "Type_Tumor_vs_Normal", cooksCutoff = FALSE) : > unused argument (cooksCutoff = FALSE) What version of DESeq2 are you using? Is the cooksCutoff not defined in the documentation when you fire?results ? -steve -- Steve Lianoglou Computational Biologist Genentech [[alternative HTML version deleted]]
Hi Ming, We recommend using the release version. But if you continue to use v1.0 you can turn off Cooks based filtering with the argument to DESeq(). It's a good idea to always read the R help, e.g., typing ?DESeq in your R session. Then you can make sure you get the help for the software version you are actually using. If you check the help page, you should see information on this argument. Mike On Feb 11, 2014 11:31 AM, "Ming Yi" <yi02@hotmail.com> wrote: > > Hi, Mike and Steve: > > Thx a lot for advice. below is my seesioninfo. my version DESeq2_1.0.19 is > OK? > > > show(sessionInfo()); > R version 3.0.1 (2013-05-16) > Platform: x86_64-unknown-linux-gnu (64-bit) > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > other attached packages: > [1] DESeq2_1.0.19 RcppArmadillo_0.4.000.2 Rcpp_0.11.0 > [4] lattice_0.20-24 Biobase_2.22.0 GenomicRanges_1.12.5 > [7] IRanges_1.18.4 BiocGenerics_0.8.0 > loaded via a namespace (and not attached): > [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 > [4] genefilter_1.42.0 grid_3.0.1 locfit_1.5-9.1 > [7] RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.1 > [10] stats4_3.0.1 survival_2.37-7 XML_3.98-1.1 > [13] xtable_1.7-1 > > > > ------------------------------ > From: michaelisaiahlove@gmail.com > Date: Tue, 11 Feb 2014 11:02:00 -0500 > Subject: Re: [BioC] DESeq2 - input data questions > To: lianoglou.steve@gene.com > CC: yi02@hotmail.com; bioconductor@r-project.org > > Yes, Steve is on it. I assumed you were using the current release > version of Bioconductor (2.13) with DESeq2 v1.2. > > In v1.0, the cooksCutoff argument was in DESeq(). > > > Also from your experience, if Cook's filtering is taken out, validation > rate much worse? In reality, some genes might have large variation than > others such as cancer-related genes. What do you think? > > > âCook's filtering is just a heuristic, so it's hard to give general > advice. The point is to help identify cases when individual samples have > too much influence on the log fold changes. I would recommend plotting the > counts of genes with large Cook's distance: > > # get the genes with highest max(Cook's distance for each sample) > cooks <- mcols(dds)\$maxCooks > idx <- order(-cooks) > # plot the normalized counts for the top gene by max Cook's distance > plot( counts(dds,normalized=TRUE)[ idx[1], ], main=paste("Max Cook's:", > cooks[idx[1]]) ) > > You can decide for yourself where to set the filter by setting cooksCutoff > = x. > > Note that large variance alone will not lead to filtering; the filtering > comes in when the variance for a majority of samples is small, but a > minority of samples have extreme counts which have large influence on the > log fold changes. > > For your experiment, if the subject variable is explaining a lot of the > variance I would make sure to include it in the design, to help isolate the > true condition effect. > > Mike > > > On Tue, Feb 11, 2014 at 10:38 AM, Steve Lianoglou < > lianoglou.steve@gene.com> wrote: > > Hi, > > On Tue, Feb 11, 2014 at 7:11 AM, Ming Yi <yi02@hotmail.com> wrote: > > Hi, Mike: > > > > > > > > Thanks a lot for the prompt response and input, which is very helpful > > > > Since some of the genes seem a bit interesting to us, and of course we > love to keep. > > > > However, when I try: > > > >> resType <- results(dds, "Type_Tumor_vs_Normal",cooksCutoff=FALSE); > > Error in results(dds, "Type_Tumor_vs_Normal", cooksCutoff = FALSE) : > > unused argument (cooksCutoff = FALSE) > > What version of DESeq2 are you using? > Is the cooksCutoff not defined in the documentation when you > fire?results ? > > -steve > > -- > Steve Lianoglou > Computational Biologist > Genentech > > > [[alternative HTML version deleted]]