I have 90 samples in total for my dataset with two genotypes, A (wild-type) and B (single gene mutant). Each genotype is broken down into 5 treatments groups, which represent the time since the treatment was applied. Each treatment has 3 biological replicates and this experiment setup was repeated 3 times. So there are 3 biological replicates from each experiment (E1,E2,E3), giving a total of 9 biological replicates per treatment. So for the genotype A, the breakdown is:
|0hr/Control||9 samples (3 from E1, 3 from E2, 3 from E3)|
|12hrs||9 samples (3 from E1, 3 from E2, 3 from E3)|
|24hrs||9 samples (3 from E1, 3 from E2, 3 from E3)|
|36hrs||9 samples (3 from E1, 3 from E2, 3 from E3)|
|48hrs||9 samples (3 from E1, 3 from E2, 3 from E3)|
(The same setup exists for genotype B)
My initial thought was to run the analyses separately for each genotype, but I saw in other posts that it would probably be better to combine the genotypes into the same dataset. I did so using the following code:
The design is meant to test for the effect of treatment time (12hrs, 24hrs, 36hrs, 48hrs) while controlling for the effect of the different experiments (E1, E2, E3) and genotype.
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design =~ experiment + genotype + treatment) colData(ddsHTSeq)$treatment <- factor(colData(ddsHTSeq)$treatment, levels = c("Control", "12hrs", "24hrs", "36hrs", "48hrs")) colData(ddsHTSeq)$experiment <- factor(colData(ddsHTSeq)$experiment, levels = c("E1", "E2", "E3")) colData(ddsHTSeq)$genotype <- factor(colData(ddsHTSeq)$genotype, levels = c("A", "B)) dds <- DESeq(ddsHTSeq)
I am looking to find the genes that change across the treatment relative to the 0h/Control. I want to find genes that have a large (+/-) log fold-change at either 12hrs or 24hrs and then come back to 0 in 36hrs and 48hrs. My thought was to perform a pair-wise comparison between the treatments and the Control. I would then take the significant genes for 12hrs and 24hrs and plot the log fold-changes for the comparisons. I also used the
lfcShrink function with the apeglm setting. (See attached plot) To find the genes of interest, I would then perform some clustering.
I think my approach is appropriate, but I wanted some confirmation of this.
Also, because I am including genotype in the design formula, the log fold-changes for the genes in the plot are the average between the two genotypes or only the log fold-changes for the wild-type genotype?
Thanks in advance!
> sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.3 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so locale:  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8  LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages:  tcltk parallel stats4 stats graphics grDevices utils datasets methods base other attached packages:  Mfuzz_2.44.0 DynDoc_1.62.0 widgetTools_1.62.0 e1071_1.7-2 pheatmap_1.0.12  scales_1.0.0 tidyr_0.8.3 RColorBrewer_1.1-2 gplots_22.214.171.124 ggplot2_3.2.1  dplyr_0.8.3 tibble_2.1.3 readr_1.3.1 DESeq2_1.24.0 SummarizedExperiment_1.14.1  DelayedArray_0.10.0 BiocParallel_1.18.1 matrixStats_0.55.0 Biobase_2.44.0 GenomicRanges_1.36.1  GenomeInfoDb_1.20.0 IRanges_2.18.2 S4Vectors_0.22.0 BiocGenerics_0.30.0 loaded via a namespace (and not attached):  bitops_1.0-6 bit64_0.9-7 tools_3.6.1 backports_1.1.4 R6_2.4.0 rpart_4.1-15  KernSmooth_2.23-15 Hmisc_4.2-0 DBI_1.0.0 lazyeval_0.2.2 colorspace_1.4-1 nnet_7.3-12  withr_2.1.2 tidyselect_0.2.5 gridExtra_2.3 bit_1.1-14 compiler_3.6.1 htmlTable_1.13.1  labeling_0.3 caTools_126.96.36.199 checkmate_1.9.4 genefilter_1.66.0 stringr_1.4.0 digest_0.6.20  foreign_0.8-72 XVector_0.24.0 base64enc_0.1-3 pkgconfig_2.0.2 htmltools_0.3.6 htmlwidgets_1.3  rlang_0.4.0 rstudioapi_0.10 RSQLite_2.1.2 gtools_3.8.1 acepack_1.4.1 RCurl_1.95-4.12  magrittr_1.5 GenomeInfoDbData_1.2.1 Formula_1.2-3 Matrix_1.2-17 Rcpp_1.0.2 munsell_0.5.0  stringi_1.4.3 zlibbioc_1.30.0 grid_3.6.1 blob_1.2.0 gdata_2.18.0 crayon_1.3.4  lattice_0.20-38 splines_3.6.1 annotate_1.62.0 hms_0.5.1 locfit_1.5-9.1 zeallot_0.1.0  knitr_1.24 pillar_1.4.2 tkWidgets_1.62.0 geneplotter_1.62.0 XML_3.98-1.20 glue_1.3.1  latticeExtra_0.6-28 data.table_1.12.2 vctrs_0.2.0 gtable_0.3.0 purrr_0.3.2 assertthat_0.2.1  xfun_0.9 xtable_1.8-4 class_7.3-15 survival_2.44-1.1 AnnotationDbi_1.46.1 memoise_1.1.0  cluster_2.1.0