Question: DESeq2 in phyloseq
gravatar for sarahmacdonald86
2.7 years ago by
sarahmacdonald860 wrote:


I am using DESeq2 in phyloseq to analyse my data, following the phyloseq tutorial.

Previously I used the following code and this seemed to work nicely, however when I am trying to repeat this I now get the following message and although the listed bacterial species are the same the log2fold changes are quite different. The difference is at the point when I try to convert the phyloseq format to DESeq2 I get the error message in bold below (previously I was not seeing this error message). I have also loaded the sessionInfo in case this helps shed any light on the problem.




qiimedata = import_qiime(otufile, mapfile, trefile)

#remove my unwanted groups

qiimedata = subset_samples(qiimedata, LesionScore != "6")

qiimedata = subset_samples(qiimedata, LesionScore != "7")

qiimedata = subset_samples(qiimedata, LesionScore != "0")

qiimedata = subset_samples(qiimedata, LesionScore != "1")

qiimedata = subset_samples(qiimedata, LesionScore != "3")

qiimedata = subset_samples(qiimedata, LesionScore != "2")

#left with the two groups I want to compare

head(sample_data(qiimedata)$LesionScore, 25)
 [1] -5 -5  4 -5 -5 -5  4 -5  4  4  4  4 -5 -5  4 -5 -5  4

diagdds = phyloseq_to_deseq2(qiimedata, ~ LesionScore)

converting counts to integer mode
the design formula contains a numeric variable with integer values,
  specifying a model with increasing fold change for higher values.
  did you mean for this to be a factor? if so, first convert
  this variable to a factor using the factor() function

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12

[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BiocInstaller_1.24.0       DESeq2_1.14.1             
 [3] SummarizedExperiment_1.4.0 Biobase_2.34.0            
 [5] GenomicRanges_1.26.3       GenomeInfoDb_1.10.3       
 [7] IRanges_2.8.1              S4Vectors_0.12.1          
 [9] BiocGenerics_0.20.0        phyloseq_1.19.1           

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1       Rcpp_0.12.9          ape_4.1             
 [4] lattice_0.20-34      Biostrings_2.42.1    assertthat_0.1      
 [7] digest_0.6.12        foreach_1.4.3        plyr_1.8.4          
[10] backports_1.0.5      acepack_1.4.1        RSQLite_1.1-2       
[13] ggplot2_2.2.1        zlibbioc_1.20.0      lazyeval_0.2.0      
[16] data.table_1.10.4    annotate_1.52.1      vegan_2.4-2         
[19] rpart_4.1-10         Matrix_1.2-8         checkmate_1.8.2     
[22] splines_3.3.2        BiocParallel_1.8.1   geneplotter_1.52.0  
[25] stringr_1.2.0        foreign_0.8-67       htmlwidgets_0.8     
[28] igraph_1.0.1         RCurl_1.95-4.8       munsell_0.4.3       
[31] base64enc_0.1-3      multtest_2.30.0      mgcv_1.8-17         
[34] htmltools_0.3.5      nnet_7.3-12          biomformat_1.2.0    
[37] tibble_1.2           gridExtra_2.2.1      htmlTable_1.9       
[40] Hmisc_4.0-2          codetools_0.2-15     XML_3.98-1.5        
[43] permute_0.9-4        MASS_7.3-45          bitops_1.0-6        
[46] grid_3.3.2           DBI_0.5-1            nlme_3.1-131        
[49] jsonlite_1.3         xtable_1.8-2         gtable_0.2.0        
[52] magrittr_1.5         scales_0.4.1         stringi_1.1.2       
[55] XVector_0.14.0       reshape2_1.4.2       genefilter_1.56.0   
[58] latticeExtra_0.6-28  Formula_1.2-1        RColorBrewer_1.1-2  
[61] iterators_1.0.8      tools_3.3.2          ade4_1.7-5          
[64] survival_2.40-1      AnnotationDbi_1.36.2 colorspace_1.3-2    
[67] rhdf5_2.18.0         cluster_2.0.5        memoise_1.0.0       
[70] knitr_1.15.1  



phyloseq deseq2 • 701 views
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by sarahmacdonald860

Hi Michael,

Thanks for the clarification.

Changing from using numerical values for LesionScore in my mapping file fixed the problem (eg from 4 to four etc), meaning I guess this is now coded as a factor.



ADD REPLYlink written 2.7 years ago by sarahmacdonald860
Answer: DeSeq2 in phyloseq
gravatar for Michael Love
2.7 years ago by
Michael Love26k
United States
Michael Love26k wrote:

hi Sarah,

It's actually not an error message, just a message.

The point of the message is: you are currently modeling counts on lesion score as a numeric variable, such that a unit increase in lesion score corresponds to a fixed fold change in counts. This is a somewhat strict modeling choice, and so we want the user to know they have specified this, in case it is not the intended model.

For various possible reasons, in previous versions of your code the lesion score could have been coded a factor, but now it is a numeric. Changing between factor and numeric is a very different model and should be expected to result in different LFCs. For example, when the variable is a factor, you can generate a LFC comparing every level to the reference level, whereas with a numeric you will have a single coefficient giving the fixed LFC for a unit increase in lesion score.

It's your choice as the analyst how to model the counts on the variables.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Michael Love26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 207 users visited in the last hour