Hi, I have a DESeq2 dataset where we want to compare patient with and without distant metastasis. As we have IHC scores for tumor infiltrating lymphocyte scores, we will also be including this in the design formula. Each of the factors has two variables (No/Yes, and hot/cold). With TIL scores are available, we would also like to assess if there is any difference between distant metastasis yes/no in the TIL score groups separately. To perform the analysis, I have made a new variable combining the two factors into four new groups, following the guidelines in the DESeq2 vignette (not using interactions).
dds$group <- factor(paste0(dds$dist_met, dds$TIL_score)) design(dds) <- ~ 0 + W_1 + group
W_1 is a factor for unwanted variation calculated with RUVseq, as our data are from Nanostring. This was generated following the approach by Bhattaracaya et al.
I am doing the following to calculate the average from the two groups of No distant metastasis vs the two groups of Yes distant metastasis (average of hot and cold within each group).
res_met <- results(dds, contrast = list(c("groupNohot", "groupNocold"), c("groupYescold", "groupYeshot")), listValues=c(1/2, -1/2))
This brings me to my question – as the groups are not of equal size, thus, is it necessary to implement different weights for the different groups? The sample distribution are as follow:
Nohot Nocold Yescold Yeshot 19 45 46 16
In both the No and Yes distant metastasis group, there is a much higher fraction of cold samples, but by this setup the average of the two groups will give equal weight to the output results? Is this still the correct approach? Or would it rather be better to run DESeq2 with Distant metastasis and TIL score as two distinct factors in the design (design=~0 + dist_met + TIL_score), then assess Dist met in the overall samples, before to proceed to analyze the subgroups using the groups generated above (by rerunning dds with a new design=~0 + W_1 + group)?
sessionInfo() R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000) Matrix products: default locale:  LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8  LC_NUMERIC=C LC_TIME=English_United States.utf8 attached base packages:  stats4 stats graphics grDevices utils datasets methods base other attached packages:  RUVSeq_1.30.0 edgeR_3.38.1 limma_3.52.2 EDASeq_2.30.0  ShortRead_1.54.0 GenomicAlignments_1.32.0 Rsamtools_2.12.0 Biostrings_2.64.0  XVector_0.36.0 BiocParallel_1.30.3 DESeq2_1.36.0 SummarizedExperiment_1.26.1  Biobase_2.56.0 MatrixGenerics_1.8.1 matrixStats_0.62.0 GenomicRanges_1.48.0  GenomeInfoDb_1.32.2 IRanges_2.30.0 S4Vectors_0.34.0 BiocGenerics_0.42.0  ctrlGene_1.0.1 rstatix_0.7.0 GSVA_1.44.2 msigdbr_7.5.1  data.table_1.14.2 ggrepel_0.9.1 forcats_0.5.1 stringr_1.4.0  dplyr_1.0.9 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0  tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1 pheatmap_1.0.12  RColorBrewer_1.1-3