Hi, I'm working with DESeq2 package and i'm stuck in the following situation, My data is from 3 sample (which i call case) grown in 2 different media
DataFrame with 6 rows and 2 columns
case condition
<integer> <factor>
5114_hc 5114 hc
5265_hc 5265 hc
5304_hc 5304 hc
5114_wIR 5114 wIR
5265_wIR 5265 wIR
5304_wIR 5304 wIR
I used at the beginning the following command for the analysis
dds <- DESeqDataSetFromMatrix(
countData = as.matrix(counts),
colData = coldata,
design = ~ condition
)
dds$condition = relevel(dds$condition, ref = "hc")
# Filtering ####
keep <- rowSums(counts(dds) >= 5) >= 2
table(keep)
dds_filt <- dds[keep,]
dds_filt = DESeq(dds_filt)
res <- results(dds_filt)
res_05 <- results(dds_filt, alpha = 0.05)
in the summary results i found out that i have not big difference in gene expression and that i have a certain number of outliers
> summary(res)
out of 17543 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 8, 0.046%
LFC < 0 (down) : 3, 0.017%
outliers [1] : 383, 2.2%
low counts [2] : 4996, 28%
(mean count < 37)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
from running vst() and plotting PCA i see that the points group together by case, i wonder if that can influence the DE analysis and i tried the same code before but changing the design formula accounting for the case identity
dds <- DESeqDataSetFromMatrix(
countData = round(as.matrix(counts)),
colData = coldata,
design = ~ case + condition
)
with this design i ended up with this result
> summary(res)
out of 17543 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 58, 0.33%
LFC < 0 (down) : 78, 0.44%
outliers [1] : 0, 0%
low counts [2] : 6122, 35%
(mean count < 63)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
So apparently i have no more outlier, the questions here is: it's a sign that this design explain better my data? If I understand correctly in this way i still testing primarily for condition right?
I've read the vignette but i still have this doubts on if it's the optimal way to manage the things. Maybe before adding the case to design based on observation of PCA results i need to check the PlotCounts of those outliers? in that case how can i identify them from the res?
If more detail are needed let me know, Thanks
> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 20.3
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] EnhancedVolcano_1.8.0 ggsci_2.9 purrr_0.3.4 tibble_3.1.5
[5] tidyverse_1.3.1 readr_1.4.0 edgeR_3.32.1 ggrepel_0.8.2
[9] rjson_0.2.20 GenomicFeatures_1.42.3 AnnotationDbi_1.52.0 tximport_1.18.0
[13] muscData_1.4.0 fgsea_1.16.0 forcats_0.5.1 ExperimentHub_1.16.1
[17] AnnotationHub_2.22.1 BiocFileCache_1.14.0 dbplyr_2.1.1.9000 singscore_1.10.0
[21] infercnv_1.6.0 clustree_0.4.4 ggraph_2.0.5 scuttle_1.0.4
[25] panc8.SeuratData_3.0.2 SeuratData_0.2.1 copykat_1.0.5 FSA_0.9.1
[29] ggpubr_0.4.0 SeuratWrappers_0.3.0 monocle3_1.0.0 dplyr_1.0.7
[33] GSA_1.03.1 AUCell_1.12.0 WriteXLS_6.3.0 data.table_1.13.4
[37] tidyr_1.1.4 SingleR_1.4.1 celldex_1.0.0 FNN_1.1.3
[41] class_7.3-19 harmony_0.1.0 Rcpp_1.0.7 readxl_1.3.1
[45] R.utils_2.10.1 R.oo_1.24.0 R.methodsS3_1.8.1 DropletUtils_1.10.3
[49] SingleCellExperiment_1.12.0 SeuratDisk_0.0.0.9019 SeuratObject_4.0.2 Seurat_4.0.5
[53] reticulate_1.24-9000 biomaRt_2.46.3 ggalluvial_0.12.3 reshape2_1.4.4
[57] msigdbr_7.2.1 stringr_1.4.0 RColorBrewer_1.1-2 limma_3.46.0
[61] knitr_1.30 plyr_1.8.6 DESeq2_1.30.1 SummarizedExperiment_1.20.0
[65] Biobase_2.50.0 MatrixGenerics_1.2.1 matrixStats_0.58.0 GenomicRanges_1.42.0
[69] GenomeInfoDb_1.26.7 IRanges_2.24.1 S4Vectors_0.28.1 BiocGenerics_0.36.1
[73] GSVA_1.38.2 ggplot2_3.3.5
loaded via a namespace (and not attached):
[1] rsvd_1.0.3 ica_1.0-2 Rsamtools_2.6.0
[4] foreach_1.5.1 lmtest_0.9-38 crayon_1.4.2
[7] spatstat.core_2.3-1 MASS_7.3-54 rhdf5filters_1.2.0
[10] nlme_3.1-153 backports_1.3.0 reprex_2.0.1
[13] rlang_0.4.12 argparse_2.1.3 XVector_0.30.0
[16] ROCR_1.0-11 irlba_2.3.3 extrafontdb_1.0
[19] extrafont_0.18 BiocParallel_1.24.1 bit64_4.0.5
[22] glue_1.4.2 sctransform_0.3.3 vipor_0.4.5
[25] spatstat.sparse_2.0-0 spatstat.geom_2.3-0 haven_2.4.3
[28] tidyselect_1.1.0 fitdistrplus_1.1-3 XML_3.99-0.6
[31] zoo_1.8-8 proj4_1.0-11 GenomicAlignments_1.26.0
[34] xtable_1.8-4 magrittr_2.0.1 cli_3.1.0
[37] zlibbioc_1.36.0 rstudioapi_0.13 miniUI_0.1.1.1
[40] rjags_4-12 rpart_4.1-15 fastmatch_1.1-0
[43] lambda.r_1.2.4 maps_3.4.0 shiny_1.5.0
[46] BiocSingular_1.6.0 xfun_0.19 askpass_1.1
[49] cluster_2.1.2 caTools_1.18.0 tidygraph_1.2.0
[52] interactiveDisplayBase_1.28.0 ape_5.6-1 listenv_0.8.0
[55] Biostrings_2.58.0 png_0.1-7 reshape_0.8.8
[58] future_1.21.0 withr_2.4.2 bitops_1.0-6
[61] ggforce_0.3.3 cellranger_1.1.0 GSEABase_1.52.1
[64] dqrng_0.2.1 coda_0.19-4 pillar_1.6.4
[67] gplots_3.1.1 cachem_1.0.4 multcomp_1.4-18
[70] fs_1.5.0 hdf5r_1.3.4 DelayedMatrixStats_1.12.3
[73] vctrs_0.3.8 ellipsis_0.3.2 generics_0.1.0
[76] tools_4.0.5 beeswarm_0.4.0 munsell_0.5.0
[79] tweenr_1.0.2 DelayedArray_0.16.3 fastmap_1.0.1
[82] compiler_4.0.5 abind_1.4-5 httpuv_1.5.5
[85] rtracklayer_1.50.0 plotly_4.9.2.1 GenomeInfoDbData_1.2.4
[88] gridExtra_2.3 lattice_0.20-45 deldir_1.0-6
[91] utf8_1.1.4 later_1.1.0.1 jsonlite_1.7.2
[94] scales_1.1.1 graph_1.68.0 pbapply_1.4-3
[97] carData_3.0-5 sparseMatrixStats_1.2.1 genefilter_1.72.1
[100] lazyeval_0.2.2 promises_1.1.1 car_3.0-12
[103] doParallel_1.0.16 goftest_1.2-2 spatstat.utils_2.2-0
[106] ash_1.0-15 sandwich_3.0-1 cowplot_1.1.0
[109] Rtsne_0.15 uwot_0.1.9 igraph_1.2.6
[112] HDF5Array_1.18.1 survival_3.2-11 yaml_2.2.1
[115] htmltools_0.5.0 memoise_2.0.0 modeltools_0.2-23
[118] locfit_1.5-9.4 graphlayouts_0.8.0 viridisLite_0.4.0
[121] digest_0.6.27 assertthat_0.2.1 mime_0.9
[124] rappdirs_0.3.1 Rttf2pt1_1.3.10 futile.options_1.0.1
[127] RSQLite_2.2.7 future.apply_1.6.0 remotes_2.3.0
[130] blob_1.2.1 futile.logger_1.4.3 splines_4.0.5
[133] Rhdf5lib_1.12.1 RCurl_1.98-1.3 broom_0.7.10
[136] hms_1.0.0 modelr_0.1.8 rhdf5_2.34.0
[139] colorspace_2.0-0 BiocManager_1.30.10 ggbeeswarm_0.6.0
[142] libcoin_1.0-9 ggrastr_1.0.1 coin_1.4-2
[145] RANN_2.6.1 mvtnorm_1.1-3 fansi_0.4.1
[148] parallelly_1.21.0 R6_2.5.0 grid_4.0.5
[151] ggridges_0.5.2 lifecycle_1.0.0 formatR_1.9
[154] curl_4.3 ggsignif_0.6.3 leiden_0.3.6
[157] fastcluster_1.2.3 Matrix_1.3-4 RcppAnnoy_0.0.19
[160] TH.data_1.1-0 iterators_1.0.13 htmlwidgets_1.5.3
[163] beachmat_2.6.4 polyclip_1.10-0 rvest_1.0.2
[166] mgcv_1.8-36 globals_0.14.0 openssl_1.4.3
[169] patchwork_1.1.0 lubridate_1.8.0 codetools_0.2-18
[172] gtools_3.8.2 prettyunits_1.1.1 gtable_0.3.0
[175] DBI_1.1.1 tensor_1.5 httr_1.4.2
[178] KernSmooth_2.23-20 stringi_1.5.3 progress_1.2.2
[181] farver_2.0.3 annotate_1.68.0 viridis_0.6.2
[184] xml2_1.3.2 BiocNeighbors_1.8.2 ggalt_0.4.0
[187] geneplotter_1.68.0 scattermore_0.7 BiocVersion_3.12.0
[190] bit_4.0.4 spatstat.data_2.1-0 pkgconfig_2.0.3
[193] rstatix_0.7.0