DESeq2: NA's in mcols(object)$betaConv
1
0
Entering edit mode
@nikolay-ivanov-23079
Last seen 2.7 years ago
USA/New York City/Weill Cornell Medicine

Hi!

I understand that when rows do not converge in beta while running DESeq2, they are labeled in mcols(object)$betaConv as "FALSE". However, I've noticed that some elements of mcols(object)$betaConv are <NA>. What does that mean exactly? Is it ok to drop these genes from the results step of DESeq2 (as recommended in this post - DESeq2 Error: rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest ).

Also, I'm performing exploratory data analyses on VST transformed count matrices by running PCA and making heatmaps (see code below). Does it matter if I use the full VST matrix for this purpose, or the trimmed-down VST matrix (one in which the genes which did not converge in beta or have NA's in mcols(object)$betaConv are dropped)? Will I get different results in these two scenarios (see Scenario 1 and Scenario 2 in the code below)?

Thank you!

library(DESeq2)

# run DE analysis
dds=DESeq(dds)

ddsClean <- dds[which(mcols(dds)$betaConv),]

#############################
## SCENARIO 1 (data exploration using the full vst count matrix, extracted from the dds object)
#############################

# make transformed count data, using variance stabilizing transformation (VST) 
vsd = vst(dds, blind=FALSE)

#run PCA 
pca = prcomp(t(assay(vsd)))

# make heatmap of VST-transformed count matrix
pheatmap(assay(vsd), cluster_rows=TRUE, cluster_cols=TRUE, annotation_col=annotation_col, scale='row')

#############################
## SCENARIO 2 (data exploration using the trimmed-down vst count matrix, extracted from the ddsClean object)
#############################

# make transformed count data, using variance stabilizing transformation (VST) 
vsd = vst(ddsClean, blind=FALSE)

#run PCA 
pca = prcomp(t(assay(vsd)))

# make heatmap of VST-transformed count matrix
pheatmap(assay(vsd), cluster_rows=TRUE, cluster_cols=TRUE, annotation_col=annotation_col, scale='row')

sessionInfo( )

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /pbtech_mounts/homes027/nai2008/anaconda3/envs/R_4.0/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] sva_3.38.0                  BiocParallel_1.24.0
 [3] genefilter_1.72.0           mgcv_1.8-33
 [5] nlme_3.1-151                tximeta_1.8.4
 [7] ensembldb_2.14.0            AnnotationFilter_1.14.0
 [9] GenomicFeatures_1.42.1      AnnotationDbi_1.52.0
[11] AnnotationHub_2.22.0        BiocFileCache_1.14.0
[13] dbplyr_2.0.0                RColorBrewer_1.1-2
[15] pheatmap_1.0.12             DESeq2_1.30.0
[17] SummarizedExperiment_1.20.0 Biobase_2.50.0
[19] MatrixGenerics_1.2.0        matrixStats_0.58.0
[21] GenomicRanges_1.42.0        GenomeInfoDb_1.26.0
[23] IRanges_2.24.0              S4Vectors_0.28.0
[25] BiocGenerics_0.36.0

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.22.0           bitops_1.0-6
 [3] bit64_4.0.5                   progress_1.2.2
 [5] httr_1.4.2                    tools_4.0.3
 [7] R6_2.5.0                      DBI_1.1.1
 [9] lazyeval_0.2.2                colorspace_2.0-0
[11] withr_2.4.1                   tidyselect_1.1.0
[13] prettyunits_1.1.1             bit_4.0.4
[15] curl_4.3                      compiler_4.0.3
[17] xml2_1.3.2                    DelayedArray_0.16.0
[19] rtracklayer_1.50.0            scales_1.1.1
[21] readr_1.4.0                   askpass_1.1
[23] rappdirs_0.3.1                stringr_1.4.0
[25] digest_0.6.27                 Rsamtools_2.6.0
[27] XVector_0.30.0                pkgconfig_2.0.3
[29] htmltools_0.5.1.1             limma_3.46.0
[31] fastmap_1.1.0                 rlang_0.4.10
[33] rstudioapi_0.13               RSQLite_2.2.3
[35] shiny_1.6.0                   farver_2.0.3
[37] generics_0.1.0                jsonlite_1.7.2
[39] dplyr_1.0.4                   RCurl_1.98-1.2
[41] magrittr_2.0.1                GenomeInfoDbData_1.2.4
[43] Matrix_1.3-2                  Rcpp_1.0.6
[45] munsell_0.5.0                 lifecycle_0.2.0
[47] edgeR_3.32.0                  stringi_1.5.3
[49] yaml_2.2.1                    zlibbioc_1.36.0
[51] grid_4.0.3                    blob_1.2.1
[53] promises_1.1.1                crayon_1.4.1
[55] lattice_0.20-41               Biostrings_2.58.0
[57] splines_4.0.3                 annotate_1.68.0
[59] hms_1.0.0                     locfit_1.5-9.4
[61] pillar_1.4.7                  geneplotter_1.68.0
[63] biomaRt_2.46.1                XML_3.99-0.5
[65] glue_1.4.2                    BiocVersion_3.12.0
[67] BiocManager_1.30.10           vctrs_0.3.6
[69] httpuv_1.5.5                  gtable_0.3.0
[71] openssl_1.4.3                 purrr_0.3.4
[73] assertthat_0.2.1              cachem_1.0.3
[75] ggplot2_3.3.3                 mime_0.9
[77] xtable_1.8-4                  later_1.1.0.1
[79] survival_3.2-7                tibble_3.0.6
[81] GenomicAlignments_1.26.0      memoise_2.0.0
[83] tximport_1.18.0               ellipsis_0.3.1
[85] interactiveDisplayBase_1.28.0
DESeq2 • 880 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

Re: Q1, do these have all zero counts?

If you run vst(dds, blind=FALSE) on a dds in which DESeq() has already been run, it will be the same transformation regardless of if there are more or less rows of data present.

ADD COMMENT
0
Entering edit mode

Thank you for your reply!

So, I have 5060 genes with NA's in mcols(object)$betaConv. Yes, most of them (5035/5060) have all zero counts. The remaining genes have rowSums of the counts <=1. I see, so NA's in mcols(object)$betaConv means that either the counts are zero in all samples for that gene, or the counts are very very low, correct?

se = tximeta(coldata=wcm_metadata, type = "salmon")
gse = summarizeToGene(se)
counts=assays(gse)$counts

all(rownames(counts)==rownames(mcols(dds))) #TRUE

tt=as.vector(rowSums(counts[which(is.na(mcols(dds)$betaConv)),]))
length(tt) # 5060
length(which(tt==0)) # 5035
max(tt) # 1
ADD REPLY
1
Entering edit mode

They are actually rounded to zero in the dds which requires integer values.

ADD REPLY

Login before adding your answer.

Traffic: 760 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6