DESeq2 in phyloseq
1
0
Entering edit mode
@sarahmacdonald86-12162
Last seen 4.9 years ago

Hello,

I am using DESeq2 in phyloseq to analyse my data, following the phyloseq tutorial.

Previously I used the following code and this seemed to work nicely, however when I am trying to repeat this I now get the following message and although the listed bacterial species are the same the log2fold changes are quite different. The difference is at the point when I try to convert the phyloseq format to DESeq2 I get the error message in bold below (previously I was not seeing this error message). I have also loaded the sessionInfo in case this helps shed any light on the problem.

otufile=("otu_table_m1000_minus_cyano_chloro_hdf5_consensuslineage.txt")

mapfile=("Mapping_file_LS.txt")

trefile=("rep_set_tree.tre")

qiimedata = import_qiime(otufile, mapfile, trefile)

#remove my unwanted groups

qiimedata = subset_samples(qiimedata, LesionScore != "6")

qiimedata = subset_samples(qiimedata, LesionScore != "7")

qiimedata = subset_samples(qiimedata, LesionScore != "0")

qiimedata = subset_samples(qiimedata, LesionScore != "1")

qiimedata = subset_samples(qiimedata, LesionScore != "3")

qiimedata = subset_samples(qiimedata, LesionScore != "2")

#left with the two groups I want to compare

[1] -5 -5  4 -5 -5 -5  4 -5  4  4  4  4 -5 -5  4 -5 -5  4

diagdds = phyloseq_to_deseq2(qiimedata, ~ LesionScore)

converting counts to integer mode
the design formula contains a numeric variable with integer values,
specifying a model with increasing fold change for higher values.
did you mean for this to be a factor? if so, first convert
this variable to a factor using the factor() function

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] BiocInstaller_1.24.0       DESeq2_1.14.1
[3] SummarizedExperiment_1.4.0 Biobase_2.34.0
[5] GenomicRanges_1.26.3       GenomeInfoDb_1.10.3
[7] IRanges_2.8.1              S4Vectors_0.12.1
[9] BiocGenerics_0.20.0        phyloseq_1.19.1

loaded via a namespace (and not attached):
[1] locfit_1.5-9.1       Rcpp_0.12.9          ape_4.1
[4] lattice_0.20-34      Biostrings_2.42.1    assertthat_0.1
[7] digest_0.6.12        foreach_1.4.3        plyr_1.8.4
[10] backports_1.0.5      acepack_1.4.1        RSQLite_1.1-2
[13] ggplot2_2.2.1        zlibbioc_1.20.0      lazyeval_0.2.0
[16] data.table_1.10.4    annotate_1.52.1      vegan_2.4-2
[19] rpart_4.1-10         Matrix_1.2-8         checkmate_1.8.2
[22] splines_3.3.2        BiocParallel_1.8.1   geneplotter_1.52.0
[25] stringr_1.2.0        foreign_0.8-67       htmlwidgets_0.8
[28] igraph_1.0.1         RCurl_1.95-4.8       munsell_0.4.3
[31] base64enc_0.1-3      multtest_2.30.0      mgcv_1.8-17
[34] htmltools_0.3.5      nnet_7.3-12          biomformat_1.2.0
[37] tibble_1.2           gridExtra_2.2.1      htmlTable_1.9
[40] Hmisc_4.0-2          codetools_0.2-15     XML_3.98-1.5
[43] permute_0.9-4        MASS_7.3-45          bitops_1.0-6
[46] grid_3.3.2           DBI_0.5-1            nlme_3.1-131
[49] jsonlite_1.3         xtable_1.8-2         gtable_0.2.0
[52] magrittr_1.5         scales_0.4.1         stringi_1.1.2
[55] XVector_0.14.0       reshape2_1.4.2       genefilter_1.56.0
[58] latticeExtra_0.6-28  Formula_1.2-1        RColorBrewer_1.1-2
[64] survival_2.40-1      AnnotationDbi_1.36.2 colorspace_1.3-2
[67] rhdf5_2.18.0         cluster_2.0.5        memoise_1.0.0
[70] knitr_1.15.1  

Cheers,

Sarah

deseq2 phyloseq • 957 views
0
Entering edit mode

Hi Michael,

Thanks for the clarification.

Changing from using numerical values for LesionScore in my mapping file fixed the problem (eg from 4 to four etc), meaning I guess this is now coded as a factor.

Cheers,

Sarah

2
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

hi Sarah,

It's actually not an error message, just a message.

The point of the message is: you are currently modeling counts on lesion score as a numeric variable, such that a unit increase in lesion score corresponds to a fixed fold change in counts. This is a somewhat strict modeling choice, and so we want the user to know they have specified this, in case it is not the intended model.

For various possible reasons, in previous versions of your code the lesion score could have been coded a factor, but now it is a numeric. Changing between factor and numeric is a very different model and should be expected to result in different LFCs. For example, when the variable is a factor, you can generate a LFC comparing every level to the reference level, whereas with a numeric you will have a single coefficient giving the fixed LFC for a unit increase in lesion score.

It's your choice as the analyst how to model the counts on the variables.