Hi!
I understand that when rows do not converge in beta while running DESeq2, they are labeled in mcols(object)$betaConv as "FALSE". However, I've noticed that some elements of mcols(object)$betaConv are <NA>. What does that mean exactly? Is it ok to drop these genes from the results step of DESeq2 (as recommended in this post - DESeq2 Error: rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest ).
Also, I'm performing exploratory data analyses on VST transformed count matrices by running PCA and making heatmaps (see code below). Does it matter if I use the full VST matrix for this purpose, or the trimmed-down VST matrix (one in which the genes which did not converge in beta or have NA's in mcols(object)$betaConv are dropped)? Will I get different results in these two scenarios (see Scenario 1 and Scenario 2 in the code below)?
Thank you!
library(DESeq2)
# run DE analysis
dds=DESeq(dds)
ddsClean <- dds[which(mcols(dds)$betaConv),]
#############################
## SCENARIO 1 (data exploration using the full vst count matrix, extracted from the dds object)
#############################
# make transformed count data, using variance stabilizing transformation (VST)
vsd = vst(dds, blind=FALSE)
#run PCA
pca = prcomp(t(assay(vsd)))
# make heatmap of VST-transformed count matrix
pheatmap(assay(vsd), cluster_rows=TRUE, cluster_cols=TRUE, annotation_col=annotation_col, scale='row')
#############################
## SCENARIO 2 (data exploration using the trimmed-down vst count matrix, extracted from the ddsClean object)
#############################
# make transformed count data, using variance stabilizing transformation (VST)
vsd = vst(ddsClean, blind=FALSE)
#run PCA
pca = prcomp(t(assay(vsd)))
# make heatmap of VST-transformed count matrix
pheatmap(assay(vsd), cluster_rows=TRUE, cluster_cols=TRUE, annotation_col=annotation_col, scale='row')
sessionInfo( )
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /pbtech_mounts/homes027/nai2008/anaconda3/envs/R_4.0/lib/libopenblasp-r0.3.10.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] sva_3.38.0 BiocParallel_1.24.0
[3] genefilter_1.72.0 mgcv_1.8-33
[5] nlme_3.1-151 tximeta_1.8.4
[7] ensembldb_2.14.0 AnnotationFilter_1.14.0
[9] GenomicFeatures_1.42.1 AnnotationDbi_1.52.0
[11] AnnotationHub_2.22.0 BiocFileCache_1.14.0
[13] dbplyr_2.0.0 RColorBrewer_1.1-2
[15] pheatmap_1.0.12 DESeq2_1.30.0
[17] SummarizedExperiment_1.20.0 Biobase_2.50.0
[19] MatrixGenerics_1.2.0 matrixStats_0.58.0
[21] GenomicRanges_1.42.0 GenomeInfoDb_1.26.0
[23] IRanges_2.24.0 S4Vectors_0.28.0
[25] BiocGenerics_0.36.0
loaded via a namespace (and not attached):
[1] ProtGenerics_1.22.0 bitops_1.0-6
[3] bit64_4.0.5 progress_1.2.2
[5] httr_1.4.2 tools_4.0.3
[7] R6_2.5.0 DBI_1.1.1
[9] lazyeval_0.2.2 colorspace_2.0-0
[11] withr_2.4.1 tidyselect_1.1.0
[13] prettyunits_1.1.1 bit_4.0.4
[15] curl_4.3 compiler_4.0.3
[17] xml2_1.3.2 DelayedArray_0.16.0
[19] rtracklayer_1.50.0 scales_1.1.1
[21] readr_1.4.0 askpass_1.1
[23] rappdirs_0.3.1 stringr_1.4.0
[25] digest_0.6.27 Rsamtools_2.6.0
[27] XVector_0.30.0 pkgconfig_2.0.3
[29] htmltools_0.5.1.1 limma_3.46.0
[31] fastmap_1.1.0 rlang_0.4.10
[33] rstudioapi_0.13 RSQLite_2.2.3
[35] shiny_1.6.0 farver_2.0.3
[37] generics_0.1.0 jsonlite_1.7.2
[39] dplyr_1.0.4 RCurl_1.98-1.2
[41] magrittr_2.0.1 GenomeInfoDbData_1.2.4
[43] Matrix_1.3-2 Rcpp_1.0.6
[45] munsell_0.5.0 lifecycle_0.2.0
[47] edgeR_3.32.0 stringi_1.5.3
[49] yaml_2.2.1 zlibbioc_1.36.0
[51] grid_4.0.3 blob_1.2.1
[53] promises_1.1.1 crayon_1.4.1
[55] lattice_0.20-41 Biostrings_2.58.0
[57] splines_4.0.3 annotate_1.68.0
[59] hms_1.0.0 locfit_1.5-9.4
[61] pillar_1.4.7 geneplotter_1.68.0
[63] biomaRt_2.46.1 XML_3.99-0.5
[65] glue_1.4.2 BiocVersion_3.12.0
[67] BiocManager_1.30.10 vctrs_0.3.6
[69] httpuv_1.5.5 gtable_0.3.0
[71] openssl_1.4.3 purrr_0.3.4
[73] assertthat_0.2.1 cachem_1.0.3
[75] ggplot2_3.3.3 mime_0.9
[77] xtable_1.8-4 later_1.1.0.1
[79] survival_3.2-7 tibble_3.0.6
[81] GenomicAlignments_1.26.0 memoise_2.0.0
[83] tximport_1.18.0 ellipsis_0.3.1
[85] interactiveDisplayBase_1.28.0
Thank you for your reply!
So, I have 5060 genes with NA's in mcols(object)$betaConv. Yes, most of them (5035/5060) have all zero counts. The remaining genes have rowSums of the counts <=1. I see, so NA's in mcols(object)$betaConv means that either the counts are zero in all samples for that gene, or the counts are very very low, correct?
They are actually rounded to zero in the
dds
which requires integer values.