Question

DESeq2: NA's in mcols(object)$betaConv

0

Entering edit mode

Nikolay Ivanov • 0

@nikolay-ivanov-23079

Last seen 3.5 years ago

USA/New York City/Weill Cornell Medicine

Hi!

I understand that when rows do not converge in beta while running DESeq2, they are labeled in mcols(object)$betaConv as "FALSE". However, I've noticed that some elements of mcols(object)$betaConv are <NA>. What does that mean exactly? Is it ok to drop these genes from the results step of DESeq2 (as recommended in this post - DESeq2 Error: rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest ).

Also, I'm performing exploratory data analyses on VST transformed count matrices by running PCA and making heatmaps (see code below). Does it matter if I use the full VST matrix for this purpose, or the trimmed-down VST matrix (one in which the genes which did not converge in beta or have NA's in mcols(object)$betaConv are dropped)? Will I get different results in these two scenarios (see Scenario 1 and Scenario 2 in the code below)?

Thank you!

library(DESeq2)

# run DE analysis
dds=DESeq(dds)

ddsClean <- dds[which(mcols(dds)$betaConv),]

#############################
## SCENARIO 1 (data exploration using the full vst count matrix, extracted from the dds object)
#############################

# make transformed count data, using variance stabilizing transformation (VST) 
vsd = vst(dds, blind=FALSE)

#run PCA 
pca = prcomp(t(assay(vsd)))

# make heatmap of VST-transformed count matrix
pheatmap(assay(vsd), cluster_rows=TRUE, cluster_cols=TRUE, annotation_col=annotation_col, scale='row')

#############################
## SCENARIO 2 (data exploration using the trimmed-down vst count matrix, extracted from the ddsClean object)
#############################

# make transformed count data, using variance stabilizing transformation (VST) 
vsd = vst(ddsClean, blind=FALSE)

#run PCA 
pca = prcomp(t(assay(vsd)))

# make heatmap of VST-transformed count matrix
pheatmap(assay(vsd), cluster_rows=TRUE, cluster_cols=TRUE, annotation_col=annotation_col, scale='row')

sessionInfo( )

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /pbtech_mounts/homes027/nai2008/anaconda3/envs/R_4.0/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] sva_3.38.0                  BiocParallel_1.24.0
 [3] genefilter_1.72.0           mgcv_1.8-33
 [5] nlme_3.1-151                tximeta_1.8.4
 [7] ensembldb_2.14.0            AnnotationFilter_1.14.0
 [9] GenomicFeatures_1.42.1      AnnotationDbi_1.52.0
[11] AnnotationHub_2.22.0        BiocFileCache_1.14.0
[13] dbplyr_2.0.0                RColorBrewer_1.1-2
[15] pheatmap_1.0.12             DESeq2_1.30.0
[17] SummarizedExperiment_1.20.0 Biobase_2.50.0
[19] MatrixGenerics_1.2.0        matrixStats_0.58.0
[21] GenomicRanges_1.42.0        GenomeInfoDb_1.26.0
[23] IRanges_2.24.0              S4Vectors_0.28.0
[25] BiocGenerics_0.36.0

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.22.0           bitops_1.0-6
 [3] bit64_4.0.5                   progress_1.2.2
 [5] httr_1.4.2                    tools_4.0.3
 [7] R6_2.5.0                      DBI_1.1.1
 [9] lazyeval_0.2.2                colorspace_2.0-0
[11] withr_2.4.1                   tidyselect_1.1.0
[13] prettyunits_1.1.1             bit_4.0.4
[15] curl_4.3                      compiler_4.0.3
[17] xml2_1.3.2                    DelayedArray_0.16.0
[19] rtracklayer_1.50.0            scales_1.1.1
[21] readr_1.4.0                   askpass_1.1
[23] rappdirs_0.3.1                stringr_1.4.0
[25] digest_0.6.27                 Rsamtools_2.6.0
[27] XVector_0.30.0                pkgconfig_2.0.3
[29] htmltools_0.5.1.1             limma_3.46.0
[31] fastmap_1.1.0                 rlang_0.4.10
[33] rstudioapi_0.13               RSQLite_2.2.3
[35] shiny_1.6.0                   farver_2.0.3
[37] generics_0.1.0                jsonlite_1.7.2
[39] dplyr_1.0.4                   RCurl_1.98-1.2
[41] magrittr_2.0.1                GenomeInfoDbData_1.2.4
[43] Matrix_1.3-2                  Rcpp_1.0.6
[45] munsell_0.5.0                 lifecycle_0.2.0
[47] edgeR_3.32.0                  stringi_1.5.3
[49] yaml_2.2.1                    zlibbioc_1.36.0
[51] grid_4.0.3                    blob_1.2.1
[53] promises_1.1.1                crayon_1.4.1
[55] lattice_0.20-41               Biostrings_2.58.0
[57] splines_4.0.3                 annotate_1.68.0
[59] hms_1.0.0                     locfit_1.5-9.4
[61] pillar_1.4.7                  geneplotter_1.68.0
[63] biomaRt_2.46.1                XML_3.99-0.5
[65] glue_1.4.2                    BiocVersion_3.12.0
[67] BiocManager_1.30.10           vctrs_0.3.6
[69] httpuv_1.5.5                  gtable_0.3.0
[71] openssl_1.4.3                 purrr_0.3.4
[73] assertthat_0.2.1              cachem_1.0.3
[75] ggplot2_3.3.3                 mime_0.9
[77] xtable_1.8-4                  later_1.1.0.1
[79] survival_3.2-7                tibble_3.0.6
[81] GenomicAlignments_1.26.0      memoise_2.0.0
[83] tximport_1.18.0               ellipsis_0.3.1
[85] interactiveDisplayBase_1.28.0

DESeq2 • 1.1k views

ADD COMMENT • link updated 3.5 years ago by Michael Love 43k • written 3.5 years ago by Nikolay Ivanov • 0

score 0 · Answer 1 · 2021-06-16

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

Re: Q1, do these have all zero counts?

If you run vst(dds, blind=FALSE) on a dds in which DESeq() has already been run, it will be the same transformation regardless of if there are more or less rows of data present.

ADD COMMENT • link 3.5 years ago Michael Love 43k

0

Entering edit mode

Thank you for your reply!

So, I have 5060 genes with NA's in mcols(object)$betaConv. Yes, most of them (5035/5060) have all zero counts. The remaining genes have rowSums of the counts <=1. I see, so NA's in mcols(object)$betaConv means that either the counts are zero in all samples for that gene, or the counts are very very low, correct?

se = tximeta(coldata=wcm_metadata, type = "salmon")
gse = summarizeToGene(se)
counts=assays(gse)$counts

all(rownames(counts)==rownames(mcols(dds))) #TRUE

tt=as.vector(rowSums(counts[which(is.na(mcols(dds)$betaConv)),]))
length(tt) # 5060
length(which(tt==0)) # 5035
max(tt) # 1

ADD REPLY • link 3.5 years ago Nikolay Ivanov • 0

1

Entering edit mode

They are actually rounded to zero in the dds which requires integer values.

ADD REPLY • link 3.5 years ago Michael Love 43k