Question: Cook´s distance on DESeq2
gravatar for David
3.7 years ago by
David860 wrote:


I´m using the DESseq package and trying the cook´s distance to filter genes.

boxplot(log10(assays(dds)[["cooks"]]), range=0, las=2,main="Cook's distance")


My sample "control_1" seems to be quite different from the others ??




Cooks distance

deseq2 • 1.1k views
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by David860
Answer: Cook´s distance on DESeq2
gravatar for Wolfgang Huber
3.7 years ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

You could explore your data further to figure out what is going on:

  • heatmap of VST-transformed data matrix
  • PCA plot
  • heatmap of sample-sample correlation matrix
  • pairwise scatterplots

Is the number of sequenced or aligned reads very different for that library? Etc., ...

ADD COMMENTlink written 3.7 years ago by Wolfgang Huber13k
Answer: Cook´s distance on DESeq2
gravatar for Michael Love
3.7 years ago by
Michael Love25k
United States
Michael Love25k wrote:
Can you provide your sessionInfo() as well?
ADD COMMENTlink written 3.7 years ago by Michael Love25k
Answer: Cook´s distance on DESeq2
gravatar for David
3.7 years ago by
David860 wrote:


I haven´t seen anything strange with that sample in my PCA, neither my clustering. (see PCA attached and hclust )(see images below).

In the meantime disease_2 has become control_6. Otherwise don´t understand the cook´s values for control 1? How should i interpret it ?






> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=ca_AD.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=ca_AD.UTF-8          LC_NAME=ca_AD.UTF-8          
 [9] LC_ADDRESS=ca_AD.UTF-8        LC_TELEPHONE=ca_AD.UTF-8     

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] psych_1.5.8                randomForest_4.6-12       
 [3] genefilter_1.52.1          ggdendro_0.1-17           
 [5] pheatmap_1.0.8             vsn_3.38.0                
 [7] gridExtra_2.0.0            xlsx_0.5.7                
 [9] xlsxjars_0.6.1             rJava_0.9-8               
[11] RColorBrewer_1.1-2         DESeq2_1.10.1             
[13] RcppArmadillo_0.6.500.4.0  Rcpp_0.12.3               
[15] SummarizedExperiment_1.0.2 Biobase_2.30.0            
[17] GenomicRanges_1.22.4       GenomeInfoDb_1.6.3        
[19] IRanges_2.4.6              S4Vectors_0.8.11          
[21] BiocGenerics_0.16.1        gplots_2.17.0             
[23] reshape2_1.4.1             klaR_0.6-12               
[25] MASS_7.3-45                caret_6.0-64              
[27] ggplot2_2.0.0              lattice_0.20-33           

loaded via a namespace (and not attached):
 [1] splines_3.2.3         foreach_1.4.3         gtools_3.5.0         
 [4] Formula_1.2-1         affy_1.48.0           latticeExtra_0.6-26  
 [7] RSQLite_1.0.0         limma_3.26.7          quantreg_5.19        
[10] digest_0.6.9          XVector_0.10.0        minqa_1.2.4          
[13] colorspace_1.2-6      preprocessCore_1.32.0 Matrix_1.2-3         
[16] plyr_1.8.3            XML_3.98-1.3          SparseM_1.7          
[19] zlibbioc_1.16.0       xtable_1.8-0          scales_0.3.0         
[22] gdata_2.17.0          affyio_1.40.0         BiocParallel_1.4.3   
[25] lme4_1.1-10           MatrixModels_0.4-1    combinat_0.0-8       
[28] annotate_1.48.0       mgcv_1.8-11           car_2.1-1            
[31] nnet_7.3-12           mnormt_1.5-3          pbkrtest_0.4-6       
[34] survival_2.38-3       magrittr_1.5          nlme_3.1-124         
[37] foreign_0.8-66        class_7.3-14          BiocInstaller_1.20.1
[40] tools_3.2.3           stringr_1.0.0         munsell_0.4.2        
[43] locfit_1.5-9.1        cluster_2.0.3         AnnotationDbi_1.32.3
[46] lambda.r_1.1.7        compiler_3.2.3        e1071_1.6-7          
[49] caTools_1.17.1        futile.logger_1.4.1   grid_3.2.3           
[52] nloptr_1.0.4          iterators_1.0.8       labeling_0.3         
[55] bitops_1.0-6          gtable_0.1.2          codetools_0.2-14     
[58] DBI_0.3.1             Hmisc_3.17-1          futile.options_1.0.0
[61] KernSmooth_2.23-15    stringi_1.0-1         geneplotter_1.48.0   
[64] rpart_4.1-10          acepack_1.3-3.3   

ADD COMMENTlink written 3.7 years ago by David860

hi David, 

Is this a standard RNA-seq dataset? Can you show a couple more stats:

rmean <- rowMeans(counts(dds, normalized=TRUE))
quantile(rmean[rmean > 0], 0:10/10)
ADD REPLYlink written 3.7 years ago by Michael Love25k
Answer: Cook´s distance on DESeq2
gravatar for David
3.7 years ago by
David860 wrote:

Hi Michael,

I must have done something wrong. I have regenerated all my data cleaning my code and things look much better now !!!!!

I have one question anyway. What the differences between my disease pattients and controls mean when looking at the cook´s distance ?? Do that mean that the sample effect is much higher in my disease samples ??



ADD COMMENTlink written 3.7 years ago by David860

You can't interpret (and DESeq2 does not filter on) the Cook's distances for groups with a single sample. This is because the definition of Cook's distance is the distance the LFC for the group would move if the sample were removed. 

So I wouldn't worry about the Cook's distances here. Everything looks ok.

ADD REPLYlink written 3.7 years ago by Michael Love25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 201 users visited in the last hour