I have 18 samples of RNAseq read counts (46,127 genes) from a time series experiment. My factors/variables are
species, with 2 levels: C.rat & C.lin stage, with 3 levels: 1, 2, & 3
So I have 3 replicates of each combination of the above factors. In making my "ddsFullCountTable", I used these as colData:
species stage C. linearis Stage 1 plant 1 C.linearis 1 C. linearis Stage 1 plant 2 C.linearis 1 C. linearis Stage 1 plant 3 C.linearis 1 C. linearis Stage 2 plant 1 C.linearis 2 C. linearis Stage 2 plant 2 C.linearis 2 C. linearis Stage 2 plant 3 C.linearis 2 C. linearis Stage 3 plant 1 C.linearis 3 C. linearis Stage 3 plant 2 C.linearis 3 C. linearis Stage 3 plant 3 C.linearis 3 C. rattanii Stage 1 plant 1 C.rattanii 1 C. rattanii Stage 1 plant 2 C.rattanii 1 C. rattanii Stage 1 plant 3 C.rattanii 1 C. rattanii Stage 2 plant 1 C.rattanii 2 C. rattanii Stage 2 plant 2 C.rattanii 2 C. rattanii Stage 2 plant 3 C.rattanii 2 C. rattanii Stage 3 plant 1 C.rattanii 3 C. rattanii Stage 3 plant 2 C.rattanii 3 C. rattanii Stage 3 plant 3 C.rattanii 3
design = ~ species + stage + species:stage
My goal is to find the genes with significant species by stage (interaction) effect, which I've done, but then to also use the LRT to find the genes with a stage main effect. Ultimately I'll need to then find the subset of genes that do have the stage main effect but do not have the interaction effect. I've run the following code:
dds <- DESeq(ddsFullCountTable) ddsTimeSeries <- DESeq(dds, test="LRT", reduced = ~ species + stage) resultsTimeSeries <- results(ddsTimeSeries, alpha = 0.01)
and this has worked fine to give me ~855 genes with an overall (i.e., ANOVA-style "omnibus") interaction effect.
[I do have one question about this, though: the vignette says: "In this case, using the likelihood ratio test with a reduced model which does not contain the interaction term will test whether the condition induces a change in gene expression at any time point after the base-level time point." (Q1) Does this mean that this test will pull out genes with interactions between time points 1-2, 2-3, and 1-3? In other words, does it treat the time factor as ordinal? or categorical?]
then I do the following to try and repeat the same for isolating genes with an overall (i.e., ANOVA-style "omnibus") stage main effect:
ddsTimeSeries <- DESeq(dds, test="LRT", reduced = ~ species + species:stage)
but I get this error
using pre-existing size factors estimating dispersions found already estimated dispersions, replacing these gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing Error in nbinomLRT(object, full = full, reduced = reduced, quiet = quiet, : less than one degree of freedom, perhaps full and reduced models are not in the correct order
I've then tried rearranging the terms of the full model before dropping stage, but the same error appears.
So my second question is (Q2) Why is dropping stage from the LRT not a possibility, or is it?
To get around it, I've run the same ddsFullCountTable data with design= ~stage and reduced = ~1, but it seems strange and incorrect to use 2 different full models to get to this answer about the same data, so... (Q3) is this work-around incorrect or wrong from a modeling perspective and/or in DESeq2?
To get around this issue another way, I've considered running the same ddsFullCountTable data with the original full model (design= ~species + stage + species:stage), and then doing both
ddsTimeSeries <- DESeq(dds, test="LRT", reduced = ~ species + stage)
ddsTimeSeries <- DESeq(dds, test="LRT", reduced = ~ species) #which does work!
and then using the intersection of those 2 resulting sets of genes to parse out the genes with only a stage main effect, so... (Q4) again, is this work-around incorrect or wrong from a modeling perspective or in DESeq2?
Lastly, I have considered using the LRT results for the interaction effect (~855 genes), then using the union of gene sets resulting from the following independent contrasts to find all the genes with a main effect for stage:
design(dds) <- ~ species + stage + species:stage dds <- DESeq(dds, test = "Wald") results(dds, contrast=c("stage","2","1")) results(dds, contrast=c("stage","3","2")) results(dds, contrast=c("stage","3","1"))
Then I would be removing any of the 855 genes that have an interaction effect from this list, in order to achieve my goal (i.e., to find the subset of genes that do have the stage main effect but do not have the interaction effect), which again, seems incorrect to be doing across tests. so, (Q5) is this work-around incorrect or wrong from a modeling perspective or in DESeq2?
Finally, (Q6) is there a better option i have not considered? if not, which of the above methods are the most statistically sound?
Thank you so so so much to whomever can provide insight!
> sessionInfo() R version 3.6.2 (2019-12-12) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.6 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib locale:  en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages:  grid parallel stats4 stats graphics grDevices utils datasets methods base other attached packages:  apeglm_1.8.0 Rgraphviz_2.30.0 topGO_2.38.1  SparseM_1.78 GO.db_3.10.0 AnnotationDbi_1.48.0  graph_1.64.0 RColorBrewer_1.1-2 pheatmap_1.0.12  VennDiagram_1.6.20 futile.logger_1.4.3 factoextra_18.104.22.1689  ggplot2_3.2.1 devtools_2.3.0 usethis_1.6.1  FactoMineR_2.3 DESeq2_1.26.0 SummarizedExperiment_1.16.1  DelayedArray_0.12.2 BiocParallel_1.20.1 matrixStats_0.55.0  Biobase_2.46.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.0  IRanges_2.20.2 S4Vectors_0.24.3 BiocGenerics_0.32.0 loaded via a namespace (and not attached):  colorspace_1.4-1 ggsignif_0.6.0 ellipsis_0.3.0 rprojroot_1.3-2  htmlTable_1.13.3 XVector_0.26.0 base64enc_0.1-3 fs_1.3.1  rstudioapi_0.11 ggpubr_0.2.5 farver_2.0.3 remotes_2.1.1  ggrepel_0.8.1 bit64_0.9-7 mvtnorm_1.1-1 fansi_0.4.1  splines_3.6.2 leaps_3.1 geneplotter_1.64.0 knitr_1.28  pkgload_1.0.2 Formula_1.2-3 annotate_1.64.0 cluster_2.1.0  png_0.1-7 BiocManager_1.30.10 compiler_3.6.2 backports_1.1.5  assertthat_0.2.1 Matrix_1.2-18 lazyeval_0.2.2 cli_2.0.2  formatR_1.7 acepack_1.4.1 htmltools_0.4.0 prettyunits_1.1.1  tools_3.6.2 coda_0.19-3 gtable_0.3.0 glue_1.4.1  GenomeInfoDbData_1.2.2 dplyr_0.8.4 Rcpp_1.0.3 bbmle_22.214.171.124  vctrs_0.3.0 xfun_0.12 stringr_1.4.0 ps_1.3.2  testthat_2.3.2 lifecycle_0.2.0 XML_3.99-0.3 zlibbioc_1.32.0  MASS_7.3-51.5 scales_1.1.0 lambda.r_1.2.4 curl_4.3  memoise_1.1.0 gridExtra_2.3 emdbook_1.3.12 bdsmatrix_1.3-4  rpart_4.1-15 latticeExtra_0.6-29 stringi_1.4.6 RSQLite_2.2.0  genefilter_1.68.0 desc_1.2.0 checkmate_2.0.0 pkgbuild_1.0.6  rlang_0.4.6 pkgconfig_2.0.3 bitops_1.0-6 lattice_0.20-38  purrr_0.3.3 htmlwidgets_1.5.1 labeling_0.3 bit_1.1-15.2  tidyselect_1.0.0 processx_3.4.2 plyr_1.8.5 magrittr_1.5  R6_2.4.1 Hmisc_4.3-1 DBI_1.1.0 pillar_1.4.3  foreign_0.8-75 withr_2.1.2 survival_3.1-8 scatterplot3d_0.3-41  RCurl_1.98-1.1 nnet_7.3-12 tibble_3.0.1 crayon_1.3.4  futile.options_1.0.1 jpeg_0.1-8.1 locfit_1.5-9.1 data.table_1.12.8  blob_1.2.1 callr_3.4.3 digest_0.6.25 flashClust_1.01-2  xtable_1.8-4 numDeriv_2016.8-1.1 munsell_0.5.0 sessioninfo_1.1.1 >