Question

DESEQ2 / Phyloseq contrasts no longer work as before following update

0

Entering edit mode

eoin • 0

@eoin-17422

Last seen 6.7 years ago

Hello,

I have run into an issue with differential abundance analysis in DESeq2 using a Phyloseq object. I am re-running analysis using code that worked perfectly previously. A similar question has been asked but did not address my issue specifically DESeq2 contrast with list no longer works: all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'

> packageVersion("DESeq2")
[1] ‘1.21.16’

Previous code:

desfile = phyloseq_to_deseq2(ps.arc.solid, ~Sex + Location + Group) ## "Group" is a 5-level factor of greatest interest biologically.

dds.ps = DESeq(desfile, fitType = "parametric", test = "Wald")

res = resultsdds.ps, cooksCutoff = FALSE)

## Contrast 1: D7 vs D14

res <- resultsdds.ps, contrast = c("Group", "D07", "D14") )

Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues, :
as D07 is the reference level, was expecting GroupD14 to be present in 'resultsNames(object)'

The above syntax worked fine in my previous version of DESeq2.

Even if I try it as a list:

> res <- resultsdds.ps, contrast = list(c("Group", "D14", "D21") ))
Error in checkContrast(contrast, resNames) :
all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'

I have tried a couple of things suggested elsewhere: setting betaPrior = TRUE resulted in the following:

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Error in averagePriorsOverLevels(objectNZ, betaPriorVar) :
beta prior for Sex2,Location2,Group5 is not greater than 0

Finally I did manage to get my contrast of interest using the following, as suggested by Michael Love in the post linked above:

desfile = phyloseq_to_deseq2(ps.arc.solid, ~0 + Group + Location + Sex)
deseq2 = DESeq(desfile, fitType = "parametric", test = "Wald")

## Contrast 1: D7 V D14

res <- results(deseq2, contrast = c("Group", "D07", "D14") )
res

My question: is setting up the model in this way an admissible method of testing for "Group" while controlling for variance in the other, less interesting factors ("Location", "Sex")??

Thanks a lot!

R version 3.5.1 (2018-07-02)

Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C LC_TIME=English_Canada.1252

attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] bindrcpp_0.2.2 ape_5.1 dunn.test_1.3.5 microbiomeSeq_0.1
[5] scales_1.0.0 vegan_2.5-2 lattice_0.20-35 permute_0.9-4
[9] phyloseq_1.25.2 DESeq2_1.21.16 SummarizedExperiment_1.11.6 DelayedArray_0.7.30
[13] BiocParallel_1.15.8 matrixStats_0.54.0 Biobase_2.41.2 GenomicRanges_1.33.13
[17] GenomeInfoDb_1.17.1 IRanges_2.15.16 S4Vectors_0.19.19 BiocGenerics_0.27.1
[21] RColorBrewer_1.1-2 fgsea_1.6.0 Rcpp_0.12.18 forcats_0.3.0
[25] stringr_1.3.1 dplyr_0.7.6 purrr_0.2.5 readr_1.1.1
[29] tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0 tidyverse_1.2.1
[33] biomaRt_2.36.1 edgeR_3.23.3 limma_3.37.3

loaded via a namespace (and not attached):
[1] colorspace_1.3-2 htmlTable_1.12 XVector_0.21.3 base64enc_0.1-3 rstudioapi_0.7
[6] bit64_0.9-7 AnnotationDbi_1.43.1 lubridate_1.7.4 xml2_1.2.0 codetools_0.2-15
[11] splines_3.5.1 geneplotter_1.59.0 knitr_1.20 ade4_1.7-11 Formula_1.2-3
[16] jsonlite_1.5 broom_0.5.0 annotate_1.59.1 cluster_2.0.7-1 compiler_3.5.1
[21] httr_1.3.1 backports_1.1.2 assertthat_0.2.0 Matrix_1.2-14 lazyeval_0.2.1
[26] cli_1.0.0 acepack_1.4.1 htmltools_0.3.6 prettyunits_1.0.2 tools_3.5.1
[31] igraph_1.2.2 gtable_0.2.0 glue_1.3.0 GenomeInfoDbData_1.1.0 reshape2_1.4.3
[36] fastmatch_1.1-0 cellranger_1.1.0 Biostrings_2.49.1 multtest_2.37.0 nlme_3.1-137
[41] iterators_1.0.10 rvest_0.3.2 XML_3.98-1.16 MASS_7.3-50 zlibbioc_1.27.0
[46] hms_0.4.2 biomformat_1.9.0 rhdf5_2.25.4 yaml_2.2.0 memoise_1.1.0
[51] gridExtra_2.3 rpart_4.1-13 latticeExtra_0.6-28 stringi_1.1.7 RSQLite_2.1.1
[56] genefilter_1.63.0 foreach_1.4.4 checkmate_1.8.5 rlang_0.2.2 pkgconfig_2.0.2
[61] bitops_1.0-6 Rhdf5lib_1.3.1 bindr_0.1.1 htmlwidgets_1.2 labeling_0.3
[66] bit_1.1-14 tidyselect_0.2.4 plyr_1.8.4 magrittr_1.5 R6_2.2.2
[71] snow_0.4-2 Hmisc_4.1-1 DBI_1.0.0 mgcv_1.8-24 pillar_1.3.0
[76] haven_1.1.2 foreign_0.8-71 withr_2.1.2 survival_2.42-6 RCurl_1.95-4.11
[81] nnet_7.3-12 modelr_0.1.2 crayon_1.3.4 progress_1.2.0 locfit_1.5-9.1
[86] readxl_1.1.0 data.table_1.11.4 blob_1.1.1 digest_0.6.16 xtable_1.8-2
[91] munsell_0.5.0

deseq2 phyloseq • 3.3k views

ADD COMMENT • link 7.4 years ago eoin • 0

0

Entering edit mode

Sorry the forum won't let me post this as a reply to your previous answer.

Yes there seems to be some issue at this point. Maybe there is an issue with the phyloseq interface? Previously resultsNames(dds) would give me an output equal to your dummy data.

But if I build the model like this:

desfile = phyloseq_to_deseq2(ps.arc.solid, ~0 + Group + Location + Sex)
dds= DESeq(desfile, fitType = "parametric", test = "Wald")

resultsNames(dds)

[1] "GroupD07" "GroupD14" "GroupD21" "GroupD28" "GroupD96" "Location1" "Sex1"

Then I can extract:

> res <- results(dds, contrast = c("Group", "D14", "D07") )
> res
log2 fold change (MLE): Group D14 vs D07
Wald test p-value: Group D14 vs D07
DataFrame with 49 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
OTU1 0 NA NA NA NA NA
OTU2 0.899557612849987 0 5.36571059331752 0 1 1
OTU3 0.635054940735835 0 5.35548755640969 0 1 1
OTU4 0.246926462768473 0.340940960161698 5.35649040124304 0.0636500646174179 0.949248866730002 1
OTU5 0.789282734898351 0 5.36537112477623 0 1 1
.

Is this correct? In previous versions, and I thought also in this version, you should put your factor of greatest interest last? Then if I wanted to check for the effect of other factors I could just build a new model with them in the desired order? is this still the case?

Essentially I want to first check for the Age (Group) effect sequentially, while controlling for Location and Sex. Then I want to check for the effect of Location within each age (I made a new DESeq object containing samples from each age group only for this). I think maybe there has been some small change in the syntax of how to construct the model that I have missed? thanks for your help!

ADD REPLY • link 7.4 years ago eoin • 0

0

Entering edit mode

Order of variables in the design is not important when you call results() with an argument.

I can’t reproduce what you’re seeing with a minimal example so can’t help much, unless you can provide an example using something like the code I sent.

ADD REPLY • link 7.4 years ago Michael Love 43k

0

Entering edit mode

It's difficult to do in that fashion in this case as the data and variables are locked within the phyloseq object.

So if I can results with the argument, and get the desired comparison at the top of the output file

e.g. log2 fold change (MLE): Group D14 vs D07
Wald test p-value: Group D14 vs D07

then this should have considered the other variables apart from Group that were incorporated in the design?

ADD REPLY • link 7.4 years ago eoin • 0

0

Entering edit mode

Yes. The other coefficients (as seen in resultsNames) are also fit.

ADD REPLY • link 7.4 years ago Michael Love 43k

0

Entering edit mode

Ok thanks. I think this is likely a disconnect between Phyloseq and DESeq2 resulting from package updates. I'll go ahead with this method. thanks a lot for your help!

ADD REPLY • link 7.4 years ago eoin • 0

score 0 · Answer 1 · 2018-09-19

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

I think this issue is unrelated to the other thread.

The error mentions resultsNames(dds). What are resultsNames(dds)?

Also, are you loading an old version of the 'dds', or generating it in the same session as calling results()?

ADD COMMENT • link 7.4 years ago Michael Love 43k

0

Entering edit mode

Hi Michael, thanks for taking a look.

I am generating it in the same session - i think it's confusing due to my code: the previous syntax stored the output of DESeq() in "dds" while the newer one used "deseq".

So to clarify: Original code -

> desfile = phyloseq_to_deseq2(ps.arc.solid, ~Sex + Location + Group)
> dds = DESeq(desfile, fitType = "parametric", test = "Wald")
> resultsNames(dds)

[1] "Intercept" "Sex1" "Location1" "Group1" "Group2" "Group3" "Group4"

> res <- results(dds, contrast = c("Group", "D14", "D07") )
Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues, :
as D07 is the reference level, was expecting Group_D14_vs_D07 to be present in 'resultsNames(object)'
> Sex: Factor with 2 levels

Location: Factor with 2 levels

Group: Factor with 5 levels (this is a time-series experiment. each group represents a timepoint, I want to extract the temporally adjacent contrasts from the object).

Our secondary objective is to assess the effect of "Location" within "Group".

The above code worked ok before, but I assume that I need to change my syntax in the model set-up due to the updated package

thanks a lot

I hope this clarifies my issue.

ADD REPLY • link 7.4 years ago eoin • 0

0

Entering edit mode

Why does it say Group1 etc above? I thought the levels were D07? Is this a typo?

ADD REPLY • link 7.4 years ago Michael Love 43k

0

Entering edit mode

This is an example of what the resultsNames(dds) looks like for me when I build a toy dataset like yours, also using the devel branch (v1.21):

> dds <- makeExampleDESeqDataSet(m=20)
> dds$time <- factor(paste0("D",rep(0:4,each=4)))
> dds$sex <- factor(rep(c("F","F","M","M"),5))
> design(dds) <- ~sex + time
> dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> resultsNames(dds)
[1] "Intercept"     "sex_M_vs_F"    "time_D1_vs_D0" "time_D2_vs_D0"
[5] "time_D3_vs_D0" "time_D4_vs_D0"

> sessionInfo()
R Under development (unstable) (2018-05-02 r74682)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /home/love/bin/R/lib/libRblas.so
LAPACK: /home/love/bin/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices datasets  utils    
[8] methods   base     

other attached packages:
 [1] DESeq2_1.21.18              SummarizedExperiment_1.11.6
 [3] DelayedArray_0.7.39         BiocParallel_1.15.11       
 [5] matrixStats_0.54.0          Biobase_2.41.2             
 [7] GenomicRanges_1.33.13       GenomeInfoDb_1.17.1        
 [9] IRanges_2.15.17             S4Vectors_0.19.19          
[11] BiocGenerics_0.27.1         rmarkdown_1.10             
[13] testthat_2.0.0              devtools_1.13.6

ADD REPLY • link 7.4 years ago Michael Love 43k