Hi,
I am running deseq2 on a data set with multiple factors and getting this warning:
the design formula contains a numeric variable with integer values, specifying a model with increasing fold change for higher values. did you mean for this to be a factor? if so, first convert this variable to a factor using the factor() function
What difference does it make for deseq2 if I am using quantitative variable (numerical values) instead of quantitative variables (factors)?
Will there be a difference in the end results? (I guess so, otherwise, there is no reason to put the warning, but I can't understand what)
thanks
Assa
> sessionInfo() R version 3.2.3 (2015-12-10) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.2 (El Capitan) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] readr_0.2.2 WriteXLS_4.0.0 BiocParallel_1.4.3 data.table_1.9.6 [5] hwriter_1.3.2 GOstats_2.36.0 graph_1.48.0 Category_2.36.0 [9] GO.db_3.2.2 AnnotationDbi_1.32.3 Matrix_1.2-3 ggplot2_2.0.0 [13] gplots_2.17.0 biomaRt_2.26.1 ReportingTools_2.10.0 RSQLite_1.0.0 [17] DBI_0.3.1 knitr_1.12.3 RColorBrewer_1.1-2 genefilter_1.52.0 [21] DESeq2_1.10.1 RcppArmadillo_0.6.400.2.2 Rcpp_0.12.3 SummarizedExperiment_1.0.2 [25] Biobase_2.30.0 GenomicRanges_1.22.3 GenomeInfoDb_1.6.3 IRanges_2.4.6 [29] S4Vectors_0.8.7 BiocGenerics_0.16.1 stringr_1.0.0 loaded via a namespace (and not attached): [1] edgeR_3.12.0 splines_3.2.3 R.utils_2.2.0 gtools_3.5.0 Formula_1.2-1 [6] highr_0.5.1 latticeExtra_0.6-26 RBGL_1.46.0 BSgenome_1.38.0 Rsamtools_1.22.0 [11] lattice_0.20-33 biovizBase_1.18.0 limma_3.26.6 chron_2.3-47 XVector_0.10.0 [16] colorspace_1.2-6 ggbio_1.18.3 R.oo_1.19.0 plyr_1.8.3 OrganismDbi_1.12.1 [21] GSEABase_1.32.0 XML_3.98-1.3 zlibbioc_1.16.0 xtable_1.8-0 scales_0.3.0 [26] gdata_2.17.0 annotate_1.48.0 PFAM.db_3.2.2 GenomicFeatures_1.22.11 nnet_7.3-11 [31] survival_2.38-3 magrittr_1.5 evaluate_0.8 R.methodsS3_1.7.0 GGally_1.0.1 [36] foreign_0.8-66 BiocInstaller_1.20.1 tools_3.2.3 formatR_1.2.1 munsell_0.4.2 [41] locfit_1.5-9.1 cluster_2.0.3 lambda.r_1.1.7 Biostrings_2.38.3 caTools_1.17.1 [46] futile.logger_1.4.1 grid_3.2.3 RCurl_1.95-4.7 dichromat_2.0-0 VariantAnnotation_1.16.4 [51] AnnotationForge_1.12.2 bitops_1.0-6 gtable_0.1.2 reshape_0.8.5 reshape2_1.4.1 [56] GenomicAlignments_1.6.3 gridExtra_2.0.0 rtracklayer_1.30.1 Hmisc_3.17-1 futile.options_1.0.0 [61] KernSmooth_2.23-15 stringi_1.0-1 geneplotter_1.48.0 rpart_4.1-10 acepack_1.3-3.3
Do I understand it correctly, that in my case, as I don't have any fold-changes, but Time points and replica, I will need to change the columns into factors.
isn't it better to make it a warning, than just a message?
thanks
Yes, in general I recommend users to code time points as a factor, as this is the most flexible and general purpose model, and doesn't require statistical expertise.
The exception is if you are performing your own modeling of expression over time by choosing a space of smooth functions. If you want to do this kind of modeling, but are not sure how or exactly what this means, you will need to partner with someone with expertise in this area, as there are many choices to make, and these choices are important and will influence results.
No, I think a message is appropriate here, because this is standard R variable coding. A message should be sufficient for users who did not mean to encode a variable as numeric.