DESeq2 design error message with ordered factor but not numeric
1
0
Entering edit mode
chris • 0
@chris-15915
Last seen 5.9 years ago

Hello,

Ran into this with setting up a design and thought it was interesting (and I can't figure out why). We have an RNA-Seq setup that we'd like to do some basic dose-response analysis (i.e., does expression change with increasing dosage of a drug?). We have a numeric column, Dose, that we use for this now. The thought was to change that to an ordered factor which better represents the data (or not). I created a new column as an ordered factor, DoseFactor, for this purpose, which contains the same information, just stored as a factor instead of a numeric.

Dose DoseFactor
   <dbl> <ord>     
 1  0.1  0.1       
 2  0    0         
 3  0.03 0.03      
 4  0.3  0.3       
 5  0    0         
 6  0.1  0.1       
 7  0.1  0.1       
 8  0.1  0.1       
 9  0.03 0.03      
10  0.3  0.3    

If I create the dataset using the numeric column, all is fine. However, with the factor column, I get this message:

DESeqDataSetFromMatrix(countData = countMatrix.1061,
                            colData = samplesubset.1061,
                            design = ~ DoseFactor)

Error in DESeqDataSet(se, design = design, ignoreRank) : 
  design contains one or more variables with all samples having the same value,
  remove these variables from the design

I would really appreciate anyone being able to shed light on this, or point me to the error in my thinking about this variable as a factor.

Thanks!

---

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2             reshape_0.8.7              geneplotter_1.56.0         lattice_0.20-35           
 [5] gridExtra_2.3              ggthemes_3.5.0             forcats_0.3.0              stringr_1.3.1             
 [9] purrr_0.2.4                readr_1.1.1                tidyr_0.8.1                tibble_1.4.2              
[13] tidyverse_1.2.1            RColorBrewer_1.1-2         DT_0.4                     plotly_4.7.1              
[17] ggplot2_2.2.1.9000         Glimma_1.6.0               readxl_1.1.0               gplots_3.0.1              
[21] fgsea_1.4.1                Rcpp_0.12.17               GSEABase_1.40.1            graph_1.56.0              
[25] annotate_1.56.2            XML_3.98-1.11              AnnotationDbi_1.40.0       DESeq2_1.18.1             
[29] SummarizedExperiment_1.8.1 DelayedArray_0.4.1         matrixStats_0.53.1         Biobase_2.38.0            
[33] GenomicRanges_1.30.3       GenomeInfoDb_1.14.0        IRanges_2.12.0             S4Vectors_0.16.0          
[37] BiocGenerics_0.24.0        pheatmap_1.0.10            magrittr_1.5               dplyr_0.7.5               
[41] plyr_1.8.4                 biomaRt_2.34.2            

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2       rsconnect_0.8.8        rprojroot_1.3-2        htmlTable_1.11.2       XVector_0.18.0        
 [6] base64enc_0.1-3        rstudioapi_0.7         bit64_0.9-7            lubridate_1.7.4        xml2_1.2.0            
[11] splines_3.4.4          mnormt_1.5-5           knitr_1.20             Formula_1.2-3          jsonlite_1.5          
[16] broom_0.4.4            cluster_2.0.7-1        shiny_1.1.0            compiler_3.4.4         httr_1.3.1            
[21] backports_1.1.2        assertthat_0.2.0       Matrix_1.2-14          lazyeval_0.2.1         cli_1.0.0             
[26] limma_3.34.9           later_0.7.2            acepack_1.4.1          htmltools_0.3.6        prettyunits_1.0.2     
[31] tools_3.4.4            gtable_0.2.0           glue_1.2.0             GenomeInfoDbData_1.0.0 reshape2_1.4.3        
[36] fastmatch_1.1-0        cellranger_1.1.0       gdata_2.18.0           nlme_3.1-137           crosstalk_1.0.0       
[41] psych_1.8.4            rvest_0.3.2            mime_0.5               devtools_1.13.5        gtools_3.5.0          
[46] edgeR_3.20.9           zlibbioc_1.24.0        scales_0.5.0           promises_1.0.1         hms_0.4.2             
[51] yaml_2.1.19            memoise_1.1.0          rpart_4.1-13           latticeExtra_0.6-28    stringi_1.2.2         
[56] RSQLite_2.1.1          genefilter_1.60.0      checkmate_1.8.5        caTools_1.17.1         BiocParallel_1.12.0   
[61] rlang_0.2.0            pkgconfig_2.0.1        bitops_1.0-6           evaluate_0.10.1        bindr_0.1.1           
[66] labeling_0.3           htmlwidgets_1.2        bit_1.1-13             tidyselect_0.2.4       R6_2.2.2              
[71] Hmisc_4.1-1            DBI_1.0.0              pillar_1.2.2           haven_1.1.1            foreign_0.8-70        
[76] withr_2.1.2            survival_2.42-3        RCurl_1.95-4.10        nnet_7.3-12            crayon_1.3.4          
[81] modelr_0.1.2           utf8_1.1.3             KernSmooth_2.23-15     rmarkdown_1.9          progress_1.1.2        
[86] locfit_1.5-9.1         grid_3.4.4             data.table_1.10.4-3    blob_1.1.1             digest_0.6.15         
[91] xtable_1.8-2           httpuv_1.4.3           munsell_0.4.3          viridisLite_0.3.0 
deseq2 • 1.7k views
ADD COMMENT
0
Entering edit mode

Thinking out loud, I'm wondering if it might be simpler to just use a combined factor representing multiple conditions and then do contrasts with known information (group D is higher dosage than group A, for example).

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 18 minutes ago
United States

The test you are seeing gives TRUE if all the values are equal to the first value in the factor. Something like all(DoseFactor == DoseFactor[1]). Can you think of a reason why this would return TRUE? Is that table you are showing the same as the one you use to create the dds?

ADD COMMENT
0
Entering edit mode

Thanks, Michael. Yup, that's a subset of the data in the table above. Here are the actual items in DoseFactor used to create the dds. Definitely confusing.

> all(samplesubset.1061$DoseFactor==samplesubset.1061$DoseFactor[1])
[1] FALSE
> unique(samplesubset.1061$DoseFactor)
[1] 0.1  0    0.03 0.3 
Levels: 0 < 0.03 < 0.1 < 0.3

Still thinking about how to do dose-response...

ADD REPLY
0
Entering edit mode

Also interesting is that it only fails with ordered factor. Using a column of unordered factors of Dose is fine. Not really what I want, but it doesn't error out.

ADD REPLY
0
Entering edit mode

I remember now, we don't have formula support for ordered factors, but you can just supply a matrix to the design argument instead of a formula. You would build the matrix using model.matrix().

I need to figure out why the correct error message isn't being triggered (there is one, for ordered) and fix that in devel.

ADD REPLY
0
Entering edit mode

Figured out what was wrong with my code, and fixed this in devel branch so a more useful error message will be printed.

My code was assuming that I would get back a single character string from calling class() on a design variable, but this isn't what happens:

> class(ordered(1:3))
[1] "ordered" "factor"
ADD REPLY
0
Entering edit mode

Excellent, thanks for taking a look! Though, upon more thought, I think ordered factor doesn't really give us the dose-response information we would need above what an unordered factor would. Probably best to stick with the continuous variable. 

ADD REPLY

Login before adding your answer.

Traffic: 725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6