Handling NA's in Deseq2
Entering edit mode
Last seen 3 months ago

Hi everyone

First of all thank you for making rna-seq data much more accessible to an average clinical doctor through the DEseq2 packages and vignettes. I am though running into some trouble: I have a dataset of Nanostring mRNA-data from clinical study, which later was followed up. I therefore have a tremendous amount of metadata both from the primary trial and follow-up. The problem is though that the datasets contains a rather large amount of NA's. I keep getting an error message due to the NA's when I try to relate my data to metadata variables. My question therefore is: how do I subset vsd and set to exclude NA's in the specific metadata of interest for the research question?

I have already made calculations on the impact of perioperative medication to changes in geneexpression (because these datasets are full), however, now I'm on the rather frustrating part, where data have to be related to clinical impact and not just a description of physiology.

Thank you for your answer in advance!

 vsd@colData@listData$acplacPOD1 <- relevel(vsd@colData@listData$acplacPOD1, ref="Placebo")
 vsd@colData@listData$time <- factor(vsd@colData@listData$time, levels=c(0,1))
 vsd@colData@listData$chronic_pain_intensity_ACTIVITY_FU <- relevel(vsd@colData@listData$chronic_pain_intensity_ACTIVITY_FU, ref="non/slight")

 dds_2f_int <- DESeqDataSetFromMatrix(countData = counts(set[1:579,]), #keep only endogenous genes
                               colData = colData(vsd), #select metadata file
                               design = ~ W_1 + W_2 + W_3 + W_4 + W_5 + time + chronic_pain_intensity_ACTIVITY_FU)

Error message: converting counts to integer mode Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA: chronic_pain_intensity_ACTIVITY_FU


R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=Danish_Denmark.utf8 LC_CTYPE=Danish_Denmark.utf8 LC_MONETARY=Danish_Denmark.utf8 [4] LC_NUMERIC=C LC_TIME=Danish_Denmark.utf8

time zone: Europe/Copenhagen tzcode source: internal

attached base packages: [1] grid stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] reshape2_1.4.4 lme4_1.1-33 DOSE_3.26.1 ReactomePA_1.44.0
[5] clusterProfiler_4.8.1 EnhancedVolcano_1.18.0 rstatix_0.7.2 PoiClaClu_1.0.2.1
[9] hexbin_1.28.3 vsn_3.68.0 magrittr_2.0.3 org.Hs.eg.db_3.17.0
[13] AnnotationDbi_1.62.1 ComplexHeatmap_2.16.0 DelayedMatrixStats_1.22.0 ggridges_0.5.4
[17] ggnewscale_0.4.8 gridExtra_2.3 ggalt_0.4.0 RColorBrewer_1.1-3
[21] ggvenn_0.1.10 ggpubr_0.6.0 ggrastr_1.0.1 pheatmap_1.0.12
[25] viridis_0.6.3 viridisLite_0.4.2 DelayedArray_0.26.3 S4Arrays_1.0.4
[29] Matrix_1.5-4.1 WriteXLS_6.4.0 PCAtools_2.12.0 ggrepel_0.9.3
[33] ggfortify_0.4.16 MASS_7.3-60 DESeq2_1.40.1 RUVSeq_1.34.0
[37] edgeR_3.42.2 limma_3.56.1 EDASeq_2.34.0 ShortRead_1.58.0
[41] GenomicAlignments_1.36.0 SummarizedExperiment_1.30.1 MatrixGenerics_1.12.0 matrixStats_0.63.0
[45] Rsamtools_2.16.0 GenomicRanges_1.52.0 Biostrings_2.68.1 GenomeInfoDb_1.36.0
[49] XVector_0.40.0 BiocParallel_1.34.2 NanoStringQCPro_1.32.0 Biobase_2.60.0
[53] NanoNormIter_0.1.0 EnvStats_2.7.0 devtools_2.4.5 usethis_2.1.6
[57] xlsx_0.6.5 readxl_1.4.2 lubridate_1.9.2 forcats_1.0.0
[61] stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1 readr_2.1.4
[65] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
[69] IRanges_2.34.0 S4Vectors_0.38.1 BiocGenerics_0.46.0 BiocManager_1.30.20

loaded via a namespace (and not attached): [1] R.methodsS3_1.8.2 progress_1.2.2 urlchecker_1.0.1 vctrs_0.6.2
[5] digest_0.6.31 png_0.1-8 shape_1.4.6 registry_0.5-1
[9] deldir_1.0-9 httpuv_1.6.11 foreach_1.5.2 qvalue_2.32.0
[13] withr_2.5.0 xfun_0.39 ggfun_0.0.9 ellipsis_0.3.2
[17] memoise_2.0.1 ggbeeswarm_0.7.2 gson_0.1.0 profvis_0.3.8
[21] tidytree_0.4.2 GlobalOptions_0.1.2 R.oo_1.25.0 prettyunits_1.1.1
[25] KEGGREST_1.40.0 promises_1.2.0.1 httr_1.4.6 downloader_0.4
[29] restfulr_0.0.15 ps_1.7.5 rstudioapi_0.14 miniUI_0.1.1.1
[33] generics_0.1.3 reactome.db_1.84.0 processx_3.8.1 curl_5.0.0
[37] zlibbioc_1.46.0 ScaledMatrix_1.8.1 ggraph_2.1.0 polyclip_1.10-4
[41] GenomeInfoDbData_1.2.10 xtable_1.8-4 doParallel_1.0.17 evaluate_0.21
[45] BiocFileCache_2.8.0 preprocessCore_1.62.1 hms_1.1.3 irlba_2.3.5.1
[49] colorspace_2.1-0 filelock_1.0.2 later_1.3.1 ggtree_3.8.0
[53] lattice_0.21-8 NMF_0.26 shadowtext_0.1.2 XML_3.99-0.14
[57] cowplot_1.1.1 pillar_1.9.0 nlme_3.1-162 iterators_1.0.14
[61] gridBase_0.4-7 compiler_4.3.0 beachmat_2.16.0 stringi_1.7.12
[65] minqa_1.2.5 plyr_1.8.8 crayon_1.5.2 abind_1.4-5
[69] BiocIO_1.10.0 gridGraphics_0.5-1 locfit_1.5-9.7 graphlayouts_1.0.0
[73] bit_4.0.5 fastmatch_1.1-3 codetools_0.2-19 BiocSingular_1.16.0
[77] GetoptLong_1.0.5 mime_0.12 splines_4.3.0 circlize_0.4.15
[81] Rcpp_1.0.10 dbplyr_2.3.2 sparseMatrixStats_1.12.0 HDO.db_0.99.1
[85] cellranger_1.1.0 Rttf2pt1_1.3.12 interp_1.1-4 knitr_1.43
[89] blob_1.2.4 utf8_1.2.3 clue_0.3-64 fs_1.6.2
[93] pkgbuild_1.4.0 ggsignif_0.6.4 ggplotify_0.1.0 callr_3.7.3
[97] tzdb_0.4.0 tweenr_2.0.2 pkgconfig_2.0.3 tools_4.3.0
[101] cachem_1.0.8 RSQLite_2.3.1 DBI_1.1.3 graphite_1.46.0
[105] fastmap_1.1.1 rmarkdown_2.21 scales_1.2.1 broom_1.0.4
[109] patchwork_1.1.2 graph_1.78.0 carData_3.0-5 farver_2.1.1
[113] scatterpie_0.1.9 tidygraph_1.2.3 yaml_2.3.7 latticeExtra_0.6-30
[117] rtracklayer_1.60.0 cli_3.6.1 lifecycle_1.0.3 sessioninfo_1.2.2
[121] backports_1.4.1 timechange_0.2.0 gtable_0.3.3 rjson_0.2.21
[125] parallel_4.3.0 ape_5.7-1 jsonlite_1.8.4 bitops_1.0-7
[129] bit64_4.0.5 yulab.utils_0.0.6 GOSemSim_2.26.0 dqrng_0.3.0
[133] R.utils_2.12.2 lazyeval_0.2.2 shiny_1.7.4 htmltools_0.5.5
[137] affy_1.78.0 proj4_1.0-12 rJava_1.0-6 enrichplot_1.20.0
[141] GO.db_3.17.0 rappdirs_0.3.3 glue_1.6.2 RCurl_1.98-1.12
[145] treeio_1.24.0 jpeg_0.1-10 boot_1.3-28.1 igraph_1.4.3
[149] extrafontdb_1.0 R6_2.5.1 labeling_0.4.2 xlsxjars_0.6.1
[153] GenomicFeatures_1.52.0 cluster_2.1.4 rngtools_1.5.2 pkgload_1.3.2
[157] aplot_0.1.10 nloptr_2.0.3 tidyselect_1.2.0 vipor_0.4.5
[161] maps_3.4.1 ggforce_0.4.1 xml2_1.3.4 ash_1.0-15
[165] car_3.1-2 rsvd_1.0.5 munsell_0.5.0 KernSmooth_2.23-21
[169] affyio_1.70.0 data.table_1.14.8 htmlwidgets_1.6.2 aroma.light_3.30.0
[173] fgsea_1.26.0 hwriter_1.3.2.1 biomaRt_2.56.0 rlang_1.1.0
[177] extrafont_0.19 remotes_2.4.2 fansi_1.0.4 beeswarm_0.4.0

Dese DESeq2 • 304 views
Entering edit mode
ATpoint ★ 3.4k
Last seen 1 day ago

DESeq2 objects are SummarizedExperiments and these follow standard R rules.

If you had an object dds with a column group then you would do dds[,!is.na(dds$group)].

By the way, things like vsd@colData@listData$time are unnecessary, it's just vsd$time. In general, use getter and setter functions rather than accessing slots directly.

Entering edit mode

Dear ATpoint Thank you for your response - it works like a charm! You just saved a PhD-study! Thank you :)


Login before adding your answer.

Traffic: 299 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6