Cook´s distance on DESeq2
4
0
Entering edit mode
David ▴ 860
@david-3335
Last seen 3.2 years ago

Hi,

I´m using the DESseq package and trying the cook´s distance to filter genes.

boxplot(log10(assays(dds)[["cooks"]]), range=0, las=2,main="Cook's distance")

 

My sample "control_1" seems to be quite different from the others ??

thanks

 

 

Cooks distance

deseq2 • 1.5k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 6 weeks ago
EMBL European Molecular Biology Laborat…

You could explore your data further to figure out what is going on:

  • heatmap of VST-transformed data matrix
  • PCA plot
  • heatmap of sample-sample correlation matrix
  • pairwise scatterplots

Is the number of sequenced or aligned reads very different for that library? Etc., ...

ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 16 hours ago
United States
Can you provide your sessionInfo() as well?
ADD COMMENT
0
Entering edit mode
David ▴ 860
@david-3335
Last seen 3.2 years ago

Hi,

I haven´t seen anything strange with that sample in my PCA, neither my clustering. (see PCA attached and hclust )(see images below).

In the meantime disease_2 has become control_6. Otherwise don´t understand the cook´s values for control 1? How should i interpret it ?

 

PCA

 

 

 

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=ca_AD.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=ca_AD.UTF-8          LC_NAME=ca_AD.UTF-8          
 [9] LC_ADDRESS=ca_AD.UTF-8        LC_TELEPHONE=ca_AD.UTF-8     
[11] LC_MEASUREMENT=ca_AD.UTF-8    LC_IDENTIFICATION=ca_AD.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] psych_1.5.8                randomForest_4.6-12       
 [3] genefilter_1.52.1          ggdendro_0.1-17           
 [5] pheatmap_1.0.8             vsn_3.38.0                
 [7] gridExtra_2.0.0            xlsx_0.5.7                
 [9] xlsxjars_0.6.1             rJava_0.9-8               
[11] RColorBrewer_1.1-2         DESeq2_1.10.1             
[13] RcppArmadillo_0.6.500.4.0  Rcpp_0.12.3               
[15] SummarizedExperiment_1.0.2 Biobase_2.30.0            
[17] GenomicRanges_1.22.4       GenomeInfoDb_1.6.3        
[19] IRanges_2.4.6              S4Vectors_0.8.11          
[21] BiocGenerics_0.16.1        gplots_2.17.0             
[23] reshape2_1.4.1             klaR_0.6-12               
[25] MASS_7.3-45                caret_6.0-64              
[27] ggplot2_2.0.0              lattice_0.20-33           

loaded via a namespace (and not attached):
 [1] splines_3.2.3         foreach_1.4.3         gtools_3.5.0         
 [4] Formula_1.2-1         affy_1.48.0           latticeExtra_0.6-26  
 [7] RSQLite_1.0.0         limma_3.26.7          quantreg_5.19        
[10] digest_0.6.9          XVector_0.10.0        minqa_1.2.4          
[13] colorspace_1.2-6      preprocessCore_1.32.0 Matrix_1.2-3         
[16] plyr_1.8.3            XML_3.98-1.3          SparseM_1.7          
[19] zlibbioc_1.16.0       xtable_1.8-0          scales_0.3.0         
[22] gdata_2.17.0          affyio_1.40.0         BiocParallel_1.4.3   
[25] lme4_1.1-10           MatrixModels_0.4-1    combinat_0.0-8       
[28] annotate_1.48.0       mgcv_1.8-11           car_2.1-1            
[31] nnet_7.3-12           mnormt_1.5-3          pbkrtest_0.4-6       
[34] survival_2.38-3       magrittr_1.5          nlme_3.1-124         
[37] foreign_0.8-66        class_7.3-14          BiocInstaller_1.20.1
[40] tools_3.2.3           stringr_1.0.0         munsell_0.4.2        
[43] locfit_1.5-9.1        cluster_2.0.3         AnnotationDbi_1.32.3
[46] lambda.r_1.1.7        compiler_3.2.3        e1071_1.6-7          
[49] caTools_1.17.1        futile.logger_1.4.1   grid_3.2.3           
[52] nloptr_1.0.4          iterators_1.0.8       labeling_0.3         
[55] bitops_1.0-6          gtable_0.1.2          codetools_0.2-14     
[58] DBI_0.3.1             Hmisc_3.17-1          futile.options_1.0.0
[61] KernSmooth_2.23-15    stringi_1.0-1         geneplotter_1.48.0   
[64] rpart_4.1-10          acepack_1.3-3.3   

ADD COMMENT
0
Entering edit mode

hi David, 

Is this a standard RNA-seq dataset? Can you show a couple more stats:

colSums(counts(dds))
dim(dds)
rmean <- rowMeans(counts(dds, normalized=TRUE))
quantile(rmean[rmean > 0], 0:10/10)
ADD REPLY
0
Entering edit mode
David ▴ 860
@david-3335
Last seen 3.2 years ago

Hi Michael,

I must have done something wrong. I have regenerated all my data cleaning my code and things look much better now !!!!!

I have one question anyway. What the differences between my disease pattients and controls mean when looking at the cook´s distance ?? Do that mean that the sample effect is much higher in my disease samples ??

 

 

ADD COMMENT
0
Entering edit mode

You can't interpret (and DESeq2 does not filter on) the Cook's distances for groups with a single sample. This is because the definition of Cook's distance is the distance the LFC for the group would move if the sample were removed. 

So I wouldn't worry about the Cook's distances here. Everything looks ok.

ADD REPLY

Login before adding your answer.

Traffic: 222 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6