Hi Everyone,
We have a rather complicated clinical trial design from which we're attempting to isolate a treatment effect (allergen challenge - AC) from a control (diluent - DIL) in comparison to a baseline visit (BL). One complicating factor is that the BL visits occur on a different visit, but all of these three treatments happen to each of the subjects (so paired design). These subjects vary in 3 diagnosis groups (Diagnosis) - allergic asthmatic (AA), allergic non-asthmatic (ANA) and non-allergic, non-asthmatic (NANA). Additionally, we have differing time points (24 hr vs 7 day) and allergens (Cat and dust mite).
Clearly each subject is nested within disease groups, but also in their time point (24 Hr or 7D), so I've set up dummy variables to repeat a categorical variable over a group variable (Interval_Diagnosis).
I've pasted a sample of the table below:
SampleID Diagnosis Treatment Subject Interval Allergen Diagnosis_Interval_Subject Subject.nested
AA-002-2BL AA-002-2BL AA 2BL ACE-002 24H Cat AA_24H_ACE-002 A
AA-002-3AC AA-002-3AC AA 3AC ACE-002 24H Cat AA_24H_ACE-002 A
AA-002-3DIL AA-002-3DIL AA 3DIL ACE-002 24H Cat AA_24H_ACE-002 A
AA-004-2BL AA-004-2BL AA 2BL ACE-004 7D Cat AA_7D_ACE-004 A
AA-004-3DIL AA-004-3DIL AA 3DIL ACE-004 7D Cat AA_7D_ACE-004 A
AA-013-2BL AA-013-2BL AA 2BL ACE-013 24H HDMdp AA_24H_ACE-013 D
AA-013-3AC AA-013-3AC AA 3AC ACE-013 24H HDMdp AA_24H_ACE-013 D
AA-013-3DIL AA-013-3DIL AA 3DIL ACE-013 24H HDMdp AA_24H_ACE-013 D
AA-014-2BL AA-014-2BL AA 2BL ACE-014 7D HDMdp AA_7D_ACE-014 C
AA-014-3AC AA-014-3AC AA 3AC ACE-014 7D HDMdp AA_7D_ACE-014 C
AA-014-3DIL AA-014-3DIL AA 3DIL ACE-014 7D HDMdp AA_7D_ACE-014 C
AA-019-2BL AA-019-2BL AA 2BL ACE-019 7D Cat AA_7D_ACE-019 B
AA-019-3AC AA-019-3AC AA 3AC ACE-019 7D Cat AA_7D_ACE-019 B
AA-019-3DIL AA-019-3DIL AA 3DIL ACE-019 7D Cat AA_7D_ACE-019 B
Then I am setting up the model as follows, ignoring Interval and Allergen variables. I also have to remove some zeros from the model matrix, because the groups have uneven subject representation.
m = model.matrix(~ Diagnosis*Treatment + Interval*Subject.nested + Diagnosis*Subject.nested , conds)
colSums(m)
(Intercept) DiagnosisANA DiagnosisNANA Treatment3AC Treatment3DIL Interval7D Subject.nestedB
65 6 9 21 22 17 9
Subject.nestedC Subject.nestedD Subject.nestedE Subject.nestedF Subject.nestedG Subject.nestedH Subject.nestedI
9 6 6 3 3 3 3
Subject.nestedJ Subject.nestedK Subject.nestedL DiagnosisANA:Treatment3AC DiagnosisNANA:Treatment3AC DiagnosisANA:Treatment3DIL DiagnosisNANA:Treatment3DIL
3 3 3 2 3 2 3
Interval7D:Subject.nestedB Interval7D:Subject.nestedC Interval7D:Subject.nestedD Interval7D:Subject.nestedE Interval7D:Subject.nestedF Interval7D:Subject.nestedG Interval7D:Subject.nestedH
3 3 3 3 0 0 0
Interval7D:Subject.nestedI Interval7D:Subject.nestedJ Interval7D:Subject.nestedK Interval7D:Subject.nestedL DiagnosisANA:Subject.nestedB DiagnosisNANA:Subject.nestedB DiagnosisANA:Subject.nestedC
0 0 0 0 0 3 0
DiagnosisNANA:Subject.nestedC DiagnosisANA:Subject.nestedD DiagnosisNANA:Subject.nestedD DiagnosisANA:Subject.nestedE DiagnosisNANA:Subject.nestedE DiagnosisANA:Subject.nestedF DiagnosisNANA:Subject.nestedF
3 0 0 0 0 0 0
DiagnosisANA:Subject.nestedG DiagnosisNANA:Subject.nestedG DiagnosisANA:Subject.nestedH DiagnosisNANA:Subject.nestedH DiagnosisANA:Subject.nestedI DiagnosisNANA:Subject.nestedI DiagnosisANA:Subject.nestedJ
0 0 0 0 0 0 0
DiagnosisNANA:Subject.nestedJ DiagnosisANA:Subject.nestedK DiagnosisNANA:Subject.nestedK DiagnosisANA:Subject.nestedL DiagnosisNANA:Subject.nestedL
0 0 0 0 0
m = m[,which(colSums(m)!=0)]
ddsModel <- DESeqDataSetFromMatrix(countData = data,
colData = conds,
design = ~1)
ddsModel <- DESeq(ddsModel, minReplicatesForReplace = Inf, parallel = T,full = m)
Primarily what we are interested in could be expressed in plain math terms as (3DIL - 2BL) / (3DIL - 2BL) in the allergic asthmatic group. However, I'm unsure how to properly capture this.
If I look at resultsNames:
resultsNames(ddsModel)
[1] "Intercept" "DiagnosisANA" "DiagnosisNANA" "Treatment3AC" "Treatment3DIL" "Interval7D"
[7] "Subject.nestedB" "Subject.nestedC" "Subject.nestedD" "Subject.nestedE" "Subject.nestedF" "Subject.nestedG"
[13] "Subject.nestedH" "Subject.nestedI" "Subject.nestedJ" "Subject.nestedK" "Subject.nestedL" "DiagnosisANA.Treatment3AC"
[19] "DiagnosisNANA.Treatment3AC" "DiagnosisANA.Treatment3DIL" "DiagnosisNANA.Treatment3DIL" "Interval7D.Subject.nestedB" "Interval7D.Subject.nestedC" "Interval7D.Subject.nestedD"
[25] "Interval7D.Subject.nestedE" "DiagnosisNANA.Subject.nestedB" "DiagnosisNANA.Subject.nestedC"
So, my first hunch is simply carry out
res = results(ddsModel,cdiontrast=list("Treatment3AC"))
But that ignores Treatment3DIL. And if I do a specific contrast on those treatments, I get an error because I used a user-supplied model matrix:
results(ddsModel, contrast=c("Treatment","3AC","3DIL"))
Error in results(ddsModel, contrast = c("Treatment", "3AC", "3DIL")) :
only list- and numeric-type contrasts are supported for user-supplied model matrices
So, I'd appreciate any ideas on how to properly create a model and/or contrasts given this situation we are in. Thanks for the help.
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] pathview_1.32.0 rJava_1.0-5 calibrate_1.7.7 MASS_7.3-54 gage_2.42.0 Rtsne_0.15 BiocParallel_1.26.2
[8] devtools_2.4.2 usethis_2.0.1 genefilter_1.74.0 dplyr_1.0.7 stringr_1.4.0 viridis_0.6.1 viridisLite_0.4.0
[15] reshape2_1.4.4 cowplot_1.1.1 gtools_3.9.2 cluster_2.1.2 gplots_3.1.1 limma_3.48.3 R.utils_2.11.0
[22] R.oo_1.24.0 R.methodsS3_1.8.1 biomaRt_2.48.3 ggrepel_0.9.1 RColorBrewer_1.1-2 ggplot2_3.3.5 DESeq2_1.32.0
[29] SummarizedExperiment_1.22.0 Biobase_2.52.0 MatrixGenerics_1.4.3 matrixStats_0.61.0 GenomicRanges_1.44.0 GenomeInfoDb_1.28.4 IRanges_2.26.0
[36] S4Vectors_0.30.2 BiocGenerics_0.38.0 gageData_2.30.0
loaded via a namespace (and not attached):
[1] circlize_0.4.13 BiocFileCache_2.0.0 plyr_1.8.6 splines_4.1.1 digest_0.6.28 foreach_1.5.1 htmltools_0.5.2 GO.db_3.13.0 fansi_0.5.0
[10] magrittr_2.0.1 memoise_2.0.0 doParallel_1.0.16 remotes_2.4.1 ComplexHeatmap_2.8.0 Biostrings_2.60.2 annotate_1.70.0 prettyunits_1.1.1 colorspace_2.0-2
[19] blob_1.2.2 rappdirs_0.3.3 xfun_0.26 callr_3.7.0 crayon_1.4.1 RCurl_1.98-1.5 graph_1.70.0 survival_3.2-13 iterators_1.0.13
[28] glue_1.4.2 gtable_0.3.0 zlibbioc_1.38.0 XVector_0.32.0 GetoptLong_1.0.5 DelayedArray_0.18.0 pkgbuild_1.2.0 Rgraphviz_2.36.0 shape_1.4.6
[37] scales_1.1.1 DBI_1.1.1 Rcpp_1.0.7 xtable_1.8-4 progress_1.2.2 clue_0.3-60 bit_4.0.4 httr_1.4.2 ellipsis_0.3.2
[46] pkgconfig_2.0.3 XML_3.99-0.8 dbplyr_2.1.1 locfit_1.5-9.4 utf8_1.2.2 tidyselect_1.1.1 rlang_0.4.11 AnnotationDbi_1.54.1 munsell_0.5.0
[55] tools_4.1.1 cachem_1.0.6 cli_3.0.1 generics_0.1.0 RSQLite_2.2.8 evaluate_0.14 fastmap_1.1.0 yaml_2.2.1 processx_3.5.2
[64] org.Hs.eg.db_3.13.0 knitr_1.36 bit64_4.0.5 fs_1.5.0 caTools_1.18.2 purrr_0.3.4 KEGGREST_1.32.0 KEGGgraph_1.52.0 xml2_1.3.2
[73] compiler_4.1.1 rstudioapi_0.13 filelock_1.0.2 curl_4.3.2 png_0.1-7 testthat_3.1.0 tibble_3.1.5 geneplotter_1.70.0 stringi_1.7.5
[82] ps_1.6.0 desc_1.4.0 lattice_0.20-45 Matrix_1.3-4 vctrs_0.3.8 pillar_1.6.3 lifecycle_1.0.1 GlobalOptions_0.1.2 bitops_1.0-7
[91] R6_2.5.1 KernSmooth_2.23-20 gridExtra_2.3 sessioninfo_1.1.1 codetools_0.2-18 assertthat_0.2.1 pkgload_1.2.2 rprojroot_2.0.2 rjson_0.2.20
[100] withr_2.4.2 GenomeInfoDbData_1.2.6 hms_1.1.1 grid_4.1.1 rmarkdown_2.11 Cairo_1.5-12.2
Hi Michael,
Thanks for looking at this. I miss stated that ratio. Rather it should be (3AC - 2BL) / (3DIL - 2BL).
I'll attempt the contrast option in a moment, but I can't seem to get the syntax right on the subtraction method you show. I've tried:
The second list element is in the denominator (e.g. when we write B - A, that's the log scale term for a fold change of B / A and can be extracted from DESeq2 with
list("B","A")
. If you uselist(c("B","A"))
this is B + A, which is not what you want.From docs:
Great. Thank you.