DESEQ2 output gives multiple groups when only 2 groups are present
Entering edit mode
Last seen 27 days ago


I compared raw count RNAseq data from 13 controls and 15 endometriosis lesions. The samples were not grouped by disease state as can be seen from the colData file:


c1 control e1 endometriosis e2 endometriosis c2 control c3 control e3 endometriosis e4 endometriosis c4 control e5 endometriosis c5 control e6 endometriosis e7 endometriosis c6 control c7 control e8 endometriosis e9 endometriosis e10 endometriosis e11 endometriosis c8 control c9 control e12 endometriosis e13 endometriosis c10 control e14 endometriosis c11 control c12 control c13 control e15 endometriosis

When I visualized the data with plotPCA from ggplot2 I noticed that the controls had been separated into 2 groups:


When I rearranged the order of the samples in the counts_data and colData files so that all of the controls were listed first and followed by the endometriosis samples, plotPCA gave a single control group and a endometriosis group as expected:


The DESeq2 output file was also different in these 2 analyses, even though the count data should be the same. Does anyone know what is going on here? Do I always need to keep the groups together in the input file?

Thanks for your help.



counts_data <- read.csv("C:\\Users\\Quanah\\ownCLoud\\PCOS_sub+visc_sequencing_data\\raw_counts_visc_endo_select.csv", header = TRUE, check.names = FALSE, row.names = 1)
colData <- read.csv("C:\\Users\\Quanah\\ownCLoud\\PCOS_sub+visc_sequencing_data\\sample_info_visc_endo_select.csv", header = TRUE, row.names = 1)

dds <- DESeqDataSetFromMatrix(countData = counts_data, colData = colData, design = ~ disease_state)

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds$ disease_state <- relevel(dds$ disease_state, ref = "control")

dds <- DESeq(dds)

res <- results(dds)

res0.05 <- results(dds, alpha = 0.05)


vsdata <-vst(dds, blind=FALSE)

plotPCA(vsdata, intgroup = "disease_state", ntop=500)

z <-plotPCA(vsdata, intgroup = "disease_state", ntop=500)

z + geom_label(aes(label = name),size = 3, hjust = 0.15, position = nudge)

sessionInfo( )

R version 4.3.3 (2024-02-29 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale: 4 LC_COLLATE=English_Austria.1252 LC_CTYPE=English_Austria.1252
3 LC_MONETARY=English_Austria.1252 LC_NUMERIC=C
[5] LC_TIME=English_Austria.1252

time zone: Europe/Berlin tzcode source: internal

attached base packages: 4 stats4 stats graphics grDevices utils datasets methods base

other attached packages: 4 BiocManager_1.30.22 lubridate_1.9.3 forcats_1.0.0
4 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
[7] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[10] ggplot2_3.5.0 tidyverse_2.0.0 DESeq2_1.42.1
[13] SummarizedExperiment_1.32.0 Biobase_2.62.0 MatrixGenerics_1.14.0
[16] matrixStats_1.2.0 GenomicRanges_1.54.1 GenomeInfoDb_1.38.8
[19] IRanges_2.36.0 S4Vectors_0.40.2 BiocGenerics_0.48.1

loaded via a namespace (and not attached): 4 gtable_0.3.4 xfun_0.43 lattice_0.22-5
4 tzdb_0.4.0 vctrs_0.6.5 tools_4.3.3
[7] bitops_1.0-7 generics_0.1.3 parallel_4.3.3
[10] fansi_1.0.6 pkgconfig_2.0.3 Matrix_1.6-5
[13] lifecycle_1.0.4 GenomeInfoDbData_1.2.11 farver_2.1.1
[16] compiler_4.3.3 munsell_0.5.1 codetools_0.2-19
[19] htmltools_0.5.8.1 RCurl_1.98-1.14 pillar_1.9.0
[22] crayon_1.5.2 BiocParallel_1.36.0 DelayedArray_0.28.0
[25] abind_1.4-5 tidyselect_1.2.1 locfit_1.5-9.9
[28] digest_0.6.35 stringi_1.8.3 labeling_0.4.3
[31] fastmap_1.1.1 grid_4.3.3 colorspace_2.1-0
[34] cli_3.6.2 SparseArray_1.2.4 magrittr_2.0.3
[37] S4Arrays_1.2.1 utf8_1.2.4 withr_3.0.0
[40] scales_1.3.0 timechange_0.3.0 rmarkdown_2.26
[43] XVector_0.42.0 hms_1.1.3 evaluate_0.23
[46] knitr_1.46 rlang_1.1.3 Rcpp_1.0.12
[49] glue_1.7.0 rstudioapi_0.16.0 R6_2.5.1
[52] zlibbioc_1.48.2

DESeq2 • 177 views
Entering edit mode
ATpoint ★ 4.2k
Last seen 8 hours ago

In the first PCA plot there is almost certainly some hidden character like whitespaces in the column from colData that encodes this group. Like some have "control" and others have "control ". Use make.names for example or manually inspect it, or something like gsub(" ", "", group).


Login before adding your answer.

Traffic: 476 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6