Deseq2 high outlier number
1
0
Entering edit mode
Davide • 0
@a00f114c
Last seen 3 days ago
Italy

Hi, I'm working with DESeq2 package and i'm stuck in the following situation, My data is from 3 sample (which i call case) grown in 2 different media


DataFrame with 6 rows and 2 columns
              case condition
         <integer>  <factor>
5114_hc       5114       hc 
5265_hc       5265       hc 
5304_hc       5304       hc 
5114_wIR      5114       wIR
5265_wIR      5265       wIR
5304_wIR      5304       wIR

I used at the beginning the following command for the analysis


dds <- DESeqDataSetFromMatrix(
  countData = as.matrix(counts),
  colData   = coldata,
  design    = ~ condition
)
dds$condition = relevel(dds$condition, ref = "hc")


# Filtering ####
keep <- rowSums(counts(dds) >= 5) >= 2
table(keep)

dds_filt <- dds[keep,]
dds_filt = DESeq(dds_filt)
res <- results(dds_filt)
res_05 <- results(dds_filt, alpha = 0.05)

in the summary results i found out that i have not big difference in gene expression and that i have a certain number of outliers

> summary(res)

out of 17543 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 8, 0.046%
LFC < 0 (down)     : 3, 0.017%
outliers [1]       : 383, 2.2%
low counts [2]     : 4996, 28%
(mean count < 37)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

from running vst() and plotting PCA i see that the points group together by case, i wonder if that can influence the DE analysis and i tried the same code before but changing the design formula accounting for the case identity

dds <- DESeqDataSetFromMatrix(
  countData = round(as.matrix(counts)),
  colData   = coldata,
  design    = ~ case + condition
)

with this design i ended up with this result

> summary(res)

out of 17543 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 58, 0.33%
LFC < 0 (down)     : 78, 0.44%
outliers [1]       : 0, 0%
low counts [2]     : 6122, 35%
(mean count < 63)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

So apparently i have no more outlier, the questions here is: it's a sign that this design explain better my data? If I understand correctly in this way i still testing primarily for condition right?

I've read the vignette but i still have this doubts on if it's the optimal way to manage the things. Maybe before adding the case to design based on observation of PCA results i need to check the PlotCounts of those outliers? in that case how can i identify them from the res?

If more detail are needed let me know, Thanks

> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 20.3

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] EnhancedVolcano_1.8.0       ggsci_2.9                   purrr_0.3.4                 tibble_3.1.5               
 [5] tidyverse_1.3.1             readr_1.4.0                 edgeR_3.32.1                ggrepel_0.8.2              
 [9] rjson_0.2.20                GenomicFeatures_1.42.3      AnnotationDbi_1.52.0        tximport_1.18.0            
[13] muscData_1.4.0              fgsea_1.16.0                forcats_0.5.1               ExperimentHub_1.16.1       
[17] AnnotationHub_2.22.1        BiocFileCache_1.14.0        dbplyr_2.1.1.9000           singscore_1.10.0           
[21] infercnv_1.6.0              clustree_0.4.4              ggraph_2.0.5                scuttle_1.0.4              
[25] panc8.SeuratData_3.0.2      SeuratData_0.2.1            copykat_1.0.5               FSA_0.9.1                  
[29] ggpubr_0.4.0                SeuratWrappers_0.3.0        monocle3_1.0.0              dplyr_1.0.7                
[33] GSA_1.03.1                  AUCell_1.12.0               WriteXLS_6.3.0              data.table_1.13.4          
[37] tidyr_1.1.4                 SingleR_1.4.1               celldex_1.0.0               FNN_1.1.3                  
[41] class_7.3-19                harmony_0.1.0               Rcpp_1.0.7                  readxl_1.3.1               
[45] R.utils_2.10.1              R.oo_1.24.0                 R.methodsS3_1.8.1           DropletUtils_1.10.3        
[49] SingleCellExperiment_1.12.0 SeuratDisk_0.0.0.9019       SeuratObject_4.0.2          Seurat_4.0.5               
[53] reticulate_1.24-9000        biomaRt_2.46.3              ggalluvial_0.12.3           reshape2_1.4.4             
[57] msigdbr_7.2.1               stringr_1.4.0               RColorBrewer_1.1-2          limma_3.46.0               
[61] knitr_1.30                  plyr_1.8.6                  DESeq2_1.30.1               SummarizedExperiment_1.20.0
[65] Biobase_2.50.0              MatrixGenerics_1.2.1        matrixStats_0.58.0          GenomicRanges_1.42.0       
[69] GenomeInfoDb_1.26.7         IRanges_2.24.1              S4Vectors_0.28.1            BiocGenerics_0.36.1        
[73] GSVA_1.38.2                 ggplot2_3.3.5              

loaded via a namespace (and not attached):
  [1] rsvd_1.0.3                    ica_1.0-2                     Rsamtools_2.6.0              
  [4] foreach_1.5.1                 lmtest_0.9-38                 crayon_1.4.2                 
  [7] spatstat.core_2.3-1           MASS_7.3-54                   rhdf5filters_1.2.0           
 [10] nlme_3.1-153                  backports_1.3.0               reprex_2.0.1                 
 [13] rlang_0.4.12                  argparse_2.1.3                XVector_0.30.0               
 [16] ROCR_1.0-11                   irlba_2.3.3                   extrafontdb_1.0              
 [19] extrafont_0.18                BiocParallel_1.24.1           bit64_4.0.5                  
 [22] glue_1.4.2                    sctransform_0.3.3             vipor_0.4.5                  
 [25] spatstat.sparse_2.0-0         spatstat.geom_2.3-0           haven_2.4.3                  
 [28] tidyselect_1.1.0              fitdistrplus_1.1-3            XML_3.99-0.6                 
 [31] zoo_1.8-8                     proj4_1.0-11                  GenomicAlignments_1.26.0     
 [34] xtable_1.8-4                  magrittr_2.0.1                cli_3.1.0                    
 [37] zlibbioc_1.36.0               rstudioapi_0.13               miniUI_0.1.1.1               
 [40] rjags_4-12                    rpart_4.1-15                  fastmatch_1.1-0              
 [43] lambda.r_1.2.4                maps_3.4.0                    shiny_1.5.0                  
 [46] BiocSingular_1.6.0            xfun_0.19                     askpass_1.1                  
 [49] cluster_2.1.2                 caTools_1.18.0                tidygraph_1.2.0              
 [52] interactiveDisplayBase_1.28.0 ape_5.6-1                     listenv_0.8.0                
 [55] Biostrings_2.58.0             png_0.1-7                     reshape_0.8.8                
 [58] future_1.21.0                 withr_2.4.2                   bitops_1.0-6                 
 [61] ggforce_0.3.3                 cellranger_1.1.0              GSEABase_1.52.1              
 [64] dqrng_0.2.1                   coda_0.19-4                   pillar_1.6.4                 
 [67] gplots_3.1.1                  cachem_1.0.4                  multcomp_1.4-18              
 [70] fs_1.5.0                      hdf5r_1.3.4                   DelayedMatrixStats_1.12.3    
 [73] vctrs_0.3.8                   ellipsis_0.3.2                generics_0.1.0               
 [76] tools_4.0.5                   beeswarm_0.4.0                munsell_0.5.0                
 [79] tweenr_1.0.2                  DelayedArray_0.16.3           fastmap_1.0.1                
 [82] compiler_4.0.5                abind_1.4-5                   httpuv_1.5.5                 
 [85] rtracklayer_1.50.0            plotly_4.9.2.1                GenomeInfoDbData_1.2.4       
 [88] gridExtra_2.3                 lattice_0.20-45               deldir_1.0-6                 
 [91] utf8_1.1.4                    later_1.1.0.1                 jsonlite_1.7.2               
 [94] scales_1.1.1                  graph_1.68.0                  pbapply_1.4-3                
 [97] carData_3.0-5                 sparseMatrixStats_1.2.1       genefilter_1.72.1            
[100] lazyeval_0.2.2                promises_1.1.1                car_3.0-12                   
[103] doParallel_1.0.16             goftest_1.2-2                 spatstat.utils_2.2-0         
[106] ash_1.0-15                    sandwich_3.0-1                cowplot_1.1.0                
[109] Rtsne_0.15                    uwot_0.1.9                    igraph_1.2.6                 
[112] HDF5Array_1.18.1              survival_3.2-11               yaml_2.2.1                   
[115] htmltools_0.5.0               memoise_2.0.0                 modeltools_0.2-23            
[118] locfit_1.5-9.4                graphlayouts_0.8.0            viridisLite_0.4.0            
[121] digest_0.6.27                 assertthat_0.2.1              mime_0.9                     
[124] rappdirs_0.3.1                Rttf2pt1_1.3.10               futile.options_1.0.1         
[127] RSQLite_2.2.7                 future.apply_1.6.0            remotes_2.3.0                
[130] blob_1.2.1                    futile.logger_1.4.3           splines_4.0.5                
[133] Rhdf5lib_1.12.1               RCurl_1.98-1.3                broom_0.7.10                 
[136] hms_1.0.0                     modelr_0.1.8                  rhdf5_2.34.0                 
[139] colorspace_2.0-0              BiocManager_1.30.10           ggbeeswarm_0.6.0             
[142] libcoin_1.0-9                 ggrastr_1.0.1                 coin_1.4-2                   
[145] RANN_2.6.1                    mvtnorm_1.1-3                 fansi_0.4.1                  
[148] parallelly_1.21.0             R6_2.5.0                      grid_4.0.5                   
[151] ggridges_0.5.2                lifecycle_1.0.0               formatR_1.9                  
[154] curl_4.3                      ggsignif_0.6.3                leiden_0.3.6                 
[157] fastcluster_1.2.3             Matrix_1.3-4                  RcppAnnoy_0.0.19             
[160] TH.data_1.1-0                 iterators_1.0.13              htmlwidgets_1.5.3            
[163] beachmat_2.6.4                polyclip_1.10-0               rvest_1.0.2                  
[166] mgcv_1.8-36                   globals_0.14.0                openssl_1.4.3                
[169] patchwork_1.1.0               lubridate_1.8.0               codetools_0.2-18             
[172] gtools_3.8.2                  prettyunits_1.1.1             gtable_0.3.0                 
[175] DBI_1.1.1                     tensor_1.5                    httr_1.4.2                   
[178] KernSmooth_2.23-20            stringi_1.5.3                 progress_1.2.2               
[181] farver_2.0.3                  annotate_1.68.0               viridis_0.6.2                
[184] xml2_1.3.2                    BiocNeighbors_1.8.2           ggalt_0.4.0                  
[187] geneplotter_1.68.0            scattermore_0.7               BiocVersion_3.12.0           
[190] bit_4.0.4                     spatstat.data_2.1-0           pkgconfig_2.0.3              
[193] rstatix_0.7.0
DESeq2 • 91 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 6 hours ago
United States

With this design, you can put the case in the design ~case + condition -- but first convert to a factor. You don't want to model it as an integer as it is currently coded. See vignette on multi-factor designs.

ADD COMMENT

Login before adding your answer.

Traffic: 285 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6