Search
Question: DESEQ2 / Phyloseq contrasts no longer work as before following update
0
gravatar for eoin
28 days ago by
eoin0
eoin0 wrote:

Hello,

I have run into an issue with differential abundance analysis in DESeq2 using a Phyloseq object. I am re-running analysis using code that worked perfectly previously. A similar question has been asked but did not address my issue specifically DESeq2 contrast with list no longer works: all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'

> packageVersion("DESeq2")
[1] ‘1.21.16’

Previous code: 

desfile = phyloseq_to_deseq2(ps.arc.solid, ~Sex + Location + Group) ## "Group" is a 5-level factor of greatest interest biologically. 

dds.ps = DESeq(desfile, fitType = "parametric", test = "Wald")

res = resultsdds.ps, cooksCutoff = FALSE)

## Contrast 1: D7 vs D14

 res <- resultsdds.ps, contrast = c("Group", "D07", "D14") )

Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues,  : 
  as D07 is the reference level, was expecting GroupD14 to be present in 'resultsNames(object)'

The above syntax worked fine in my previous version of DESeq2. 

Even if I try it as a list:

> res <- resultsdds.ps, contrast = list(c("Group", "D14", "D21") ))
Error in checkContrast(contrast, resNames) : 
  all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'

 

I have tried a couple of things suggested elsewhere: setting betaPrior = TRUE resulted in the following:

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Error in averagePriorsOverLevels(objectNZ, betaPriorVar) : 
  beta prior for Sex2,Location2,Group5 is not greater than 0

Finally I did manage to get my contrast of interest using the following, as suggested by Michael Love in the post linked above:

desfile = phyloseq_to_deseq2(ps.arc.solid, ~0 + Group + Location + Sex)
deseq2 = DESeq(desfile, fitType = "parametric", test = "Wald")

## Contrast 1: D7 V D14 

res <- results(deseq2, contrast = c("Group", "D07", "D14") )
res

My question: is setting up the model in this way an admissible method of testing for "Group" while controlling for variance in the other, less interesting factors ("Location", "Sex")??

Thanks a lot!

R version 3.5.1 (2018-07-02)

Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2              ape_5.1                     dunn.test_1.3.5             microbiomeSeq_0.1          
 [5] scales_1.0.0                vegan_2.5-2                 lattice_0.20-35             permute_0.9-4              
 [9] phyloseq_1.25.2             DESeq2_1.21.16              SummarizedExperiment_1.11.6 DelayedArray_0.7.30        
[13] BiocParallel_1.15.8         matrixStats_0.54.0          Biobase_2.41.2              GenomicRanges_1.33.13      
[17] GenomeInfoDb_1.17.1         IRanges_2.15.16             S4Vectors_0.19.19           BiocGenerics_0.27.1        
[21] RColorBrewer_1.1-2          fgsea_1.6.0                 Rcpp_0.12.18                forcats_0.3.0              
[25] stringr_1.3.1               dplyr_0.7.6                 purrr_0.2.5                 readr_1.1.1                
[29] tidyr_0.8.1                 tibble_1.4.2                ggplot2_3.0.0               tidyverse_1.2.1            
[33] biomaRt_2.36.1              edgeR_3.23.3                limma_3.37.3               

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2       htmlTable_1.12         XVector_0.21.3         base64enc_0.1-3        rstudioapi_0.7        
 [6] bit64_0.9-7            AnnotationDbi_1.43.1   lubridate_1.7.4        xml2_1.2.0             codetools_0.2-15      
[11] splines_3.5.1          geneplotter_1.59.0     knitr_1.20             ade4_1.7-11            Formula_1.2-3         
[16] jsonlite_1.5           broom_0.5.0            annotate_1.59.1        cluster_2.0.7-1        compiler_3.5.1        
[21] httr_1.3.1             backports_1.1.2        assertthat_0.2.0       Matrix_1.2-14          lazyeval_0.2.1        
[26] cli_1.0.0              acepack_1.4.1          htmltools_0.3.6        prettyunits_1.0.2      tools_3.5.1           
[31] igraph_1.2.2           gtable_0.2.0           glue_1.3.0             GenomeInfoDbData_1.1.0 reshape2_1.4.3        
[36] fastmatch_1.1-0        cellranger_1.1.0       Biostrings_2.49.1      multtest_2.37.0        nlme_3.1-137          
[41] iterators_1.0.10       rvest_0.3.2            XML_3.98-1.16          MASS_7.3-50            zlibbioc_1.27.0       
[46] hms_0.4.2              biomformat_1.9.0       rhdf5_2.25.4           yaml_2.2.0             memoise_1.1.0         
[51] gridExtra_2.3          rpart_4.1-13           latticeExtra_0.6-28    stringi_1.1.7          RSQLite_2.1.1         
[56] genefilter_1.63.0      foreach_1.4.4          checkmate_1.8.5        rlang_0.2.2            pkgconfig_2.0.2       
[61] bitops_1.0-6           Rhdf5lib_1.3.1         bindr_0.1.1            htmlwidgets_1.2        labeling_0.3          
[66] bit_1.1-14             tidyselect_0.2.4       plyr_1.8.4             magrittr_1.5           R6_2.2.2              
[71] snow_0.4-2             Hmisc_4.1-1            DBI_1.0.0              mgcv_1.8-24            pillar_1.3.0          
[76] haven_1.1.2            foreign_0.8-71         withr_2.1.2            survival_2.42-6        RCurl_1.95-4.11       
[81] nnet_7.3-12            modelr_0.1.2           crayon_1.3.4           progress_1.2.0         locfit_1.5-9.1        
[86] readxl_1.1.0           data.table_1.11.4      blob_1.1.1             digest_0.6.16          xtable_1.8-2          
[91] munsell_0.5.0  

ADD COMMENTlink modified 27 days ago • written 28 days ago by eoin0

Sorry the forum won't let me post this as a reply to your previous answer.

Yes there seems to be some issue at this point.  Maybe there is an issue with the phyloseq interface? Previously resultsNames(dds) would give me an output equal to your dummy data. 


 
           


But if I build the model like this:

desfile = phyloseq_to_deseq2(ps.arc.solid, ~0 + Group + Location + Sex)
dds= DESeq(desfile, fitType = "parametric", test = "Wald")

resultsNames(dds)

[1] "GroupD07"  "GroupD14"  "GroupD21"  "GroupD28"  "GroupD96"  "Location1" "Sex1"   

Then I can extract: 

> res <- results(dds, contrast = c("Group", "D14", "D07") )
> res
log2 fold change (MLE): Group D14 vs D07 
Wald test p-value: Group D14 vs D07 
DataFrame with 49 rows and 6 columns
                baseMean     log2FoldChange            lfcSE               stat             pvalue              padj
               <numeric>          <numeric>        <numeric>          <numeric>          <numeric>         <numeric>
OTU1                   0                 NA               NA                 NA                 NA                NA
OTU2   0.899557612849987                  0 5.36571059331752                  0                  1                 1
OTU3   0.635054940735835                  0 5.35548755640969                  0                  1                 1
OTU4   0.246926462768473  0.340940960161698 5.35649040124304 0.0636500646174179  0.949248866730002                 1
OTU5   0.789282734898351                  0 5.36537112477623                  0                  1                 1

Is this correct? In previous versions, and I thought also in this version, you should put your factor of greatest interest last? Then if I wanted to check for the effect of other factors I could just build a new model with them in the desired order? is this still the case? 

Essentially I want to first check for the Age (Group) effect sequentially, while controlling for Location and Sex. Then I want to check for the effect of Location within each age (I made a new DESeq object containing samples from each age group only for this). I think maybe there has been some small change in the syntax of how to construct the model that I have missed? thanks for your help!

ADD REPLYlink written 27 days ago by eoin0

Order of variables in the design is not important when you call results() with an argument.

I can’t reproduce what you’re seeing with a minimal example so can’t help much, unless you can provide an example using something like the code I sent.

ADD REPLYlink written 27 days ago by Michael Love19k

It's difficult to do in that fashion in this case as the data and variables are locked within the phyloseq object. 

So if I can results with the argument, and get the desired comparison at the top of the output file

e.g. log2 fold change (MLE): Group D14 vs D07 
Wald test p-value: Group D14 vs D07 

then this should have considered the other variables apart from Group that were incorporated in the design?

ADD REPLYlink written 27 days ago by eoin0

Yes. The other coefficients (as seen in resultsNames) are also fit.

ADD REPLYlink written 27 days ago by Michael Love19k

Ok thanks. I think this is likely a disconnect between Phyloseq and DESeq2 resulting from package updates. I'll go ahead with this method. thanks a lot for your help!

ADD REPLYlink written 27 days ago by eoin0
0
gravatar for Michael Love
28 days ago by
Michael Love19k
United States
Michael Love19k wrote:

I think this issue is unrelated to the other thread.

The error mentions resultsNames(dds). What are resultsNames(dds)?

Also, are you loading an old version of the 'dds', or generating it in the same session as calling results()?

ADD COMMENTlink written 28 days ago by Michael Love19k

Hi Michael, thanks for taking a look.

I am generating it in the same session - i think it's confusing due to my code: the previous syntax stored the output of DESeq() in "dds" while the newer one used "deseq". 

So to clarify: Original code -

> desfile = phyloseq_to_deseq2(ps.arc.solid, ~Sex + Location + Group)
> dds = DESeq(desfile, fitType = "parametric", test = "Wald")
> resultsNames(dds) 

[1] "Intercept" "Sex1"      "Location1" "Group1"    "Group2"    "Group3"    "Group4"

> res <- results(dds, contrast = c("Group", "D14", "D07") )
Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues,  : 
  as D07 is the reference level, was expecting Group_D14_vs_D07 to be present in 'resultsNames(object)'
> Sex: Factor with 2 levels

Location: Factor with 2 levels

Group: Factor with 5 levels (this is a time-series experiment. each group represents a timepoint, I want to extract the temporally adjacent contrasts from the object).

Our secondary objective is to assess the effect of "Location" within "Group".

The above code worked ok before, but I assume that I need to change my syntax in the model set-up due to the updated package 

thanks a lot 

 

 

 

I hope this clarifies my issue. 

 

ADD REPLYlink written 28 days ago by eoin0

Why does it say Group1 etc above? I thought the levels were D07? Is this a typo?

ADD REPLYlink written 28 days ago by Michael Love19k

This is an example of what the resultsNames(dds) looks like for me when I build a toy dataset like yours, also using the devel branch (v1.21):

> dds <- makeExampleDESeqDataSet(m=20)
> dds$time <- factor(paste0("D",rep(0:4,each=4)))
> dds$sex <- factor(rep(c("F","F","M","M"),5))
> design(dds) <- ~sex + time
> dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> resultsNames(dds)
[1] "Intercept"     "sex_M_vs_F"    "time_D1_vs_D0" "time_D2_vs_D0"
[5] "time_D3_vs_D0" "time_D4_vs_D0"

> sessionInfo()
R Under development (unstable) (2018-05-02 r74682)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /home/love/bin/R/lib/libRblas.so
LAPACK: /home/love/bin/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices datasets  utils    
[8] methods   base     

other attached packages:
 [1] DESeq2_1.21.18              SummarizedExperiment_1.11.6
 [3] DelayedArray_0.7.39         BiocParallel_1.15.11       
 [5] matrixStats_0.54.0          Biobase_2.41.2             
 [7] GenomicRanges_1.33.13       GenomeInfoDb_1.17.1        
 [9] IRanges_2.15.17             S4Vectors_0.19.19          
[11] BiocGenerics_0.27.1         rmarkdown_1.10             
[13] testthat_2.0.0              devtools_1.13.6   
ADD REPLYlink written 27 days ago by Michael Love19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 195 users visited in the last hour