DESeq2 results(): 'name' argument and ?equivalent? numeric contrast produce different results
1
0
Entering edit mode
@ryanmcminds-8921
Last seen 8.4 years ago
United States

Hi -

I am using DESEq2 to analyze a dataset with two factors, each with two levels. I am therefore using a standard model matrix, with the following formula and output from resultsNames(des):

> des <- DESeq(des,formula=~nutrients*snail)
> resultsNames(des)
[1] "Intercept"             "nutrients_yes_vs_no"   "snail_yes_vs_no"       "nutrientsyes.snailyes"

I had initially extracted results as recommended in the vignette and ?results, using:

results(des, contrast=c('nutrients','yes','no'))

And the same for the 'snail' treatment, with the argument

name='nutrientsyes.snailyes'

for the interaction. This returned reasonable results. However, I was exploring the use of numeric contrasts (mostly just trying to become more familiar with the various options available), and became confused. I was under the impression that these three arguments should produce equivalent output when a standard model is used:

  1. contrast=c('nutrients','yes','no')
  2. contrast=c(0,1,0,0)
  3. name='nutrients_yes_vs_no'

However, while arguments (1) and (3) produce identical results, argument (2) does not. The numeric contrast vector version led to overall lower p-values and additional genes that were deemed 'significant'. Visual inspection verified that it was reasonable to accept that the additional genes were indeed expressed differentially.

If this is not the equivalent numeric contrast, what is?

Thank you in advance for your help. This is my first support forum post, so please excuse me for leaving anything out!

 

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] beanplot_1.2              RColorBrewer_1.1-2        ggplot2_1.0.1             phyloseq_1.12.2           DESeq2_1.8.1              RcppArmadillo_0.5.600.2.0
 [7] Rcpp_0.12.1               GenomicRanges_1.20.8      GenomeInfoDb_1.4.3        IRanges_2.2.9             S4Vectors_0.6.6           BiocGenerics_0.14.0      

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1       ape_3.3              lattice_0.20-33      Biostrings_2.36.4    digest_0.6.8         foreach_1.4.2        biom_0.3.12         
 [8] plyr_1.8.3           chron_2.3-47         futile.options_1.0.0 acepack_1.3-3.3      RSQLite_1.0.0        zlibbioc_1.14.0      data.table_1.9.6    
[15] annotate_1.46.1      vegan_2.3-1          rpart_4.1-10         Matrix_1.2-2         proto_0.3-10         splines_3.2.1        BiocParallel_1.2.22 
[22] geneplotter_1.46.0   stringr_1.0.0        foreign_0.8-66       igraph_1.0.1         munsell_0.4.2        multtest_2.24.0      mgcv_1.8-7          
[29] nnet_7.3-11          gridExtra_2.0.0      Hmisc_3.17-0         codetools_0.2-14     XML_3.98-1.3         permute_0.8-4        MASS_7.3-44         
[36] grid_3.2.1           nlme_3.1-122         xtable_1.7-4         gtable_0.1.2         DBI_0.3.1            magrittr_1.5         scales_0.3.0        
[43] stringi_0.5-5        XVector_0.8.0        reshape2_1.4.1       genefilter_1.50.0    latticeExtra_0.6-26  futile.logger_1.4.1  Formula_1.2-1       
[50] lambda.r_1.1.7       iterators_1.0.7      tools_3.2.1          RJSONIO_1.3-0        ade4_1.7-2           Biobase_2.28.0       survival_2.38-3     
[57] AnnotationDbi_1.30.1 colorspace_1.2-6     cluster_2.0.3  
deseq2 • 2.2k views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

hi Ryan,

Thanks for the report.

I can't reproduce this on my end (using the toy data constructor makeExampleDESeqDataSet).

Could you please email me a minimal example (data object + R script) which shows that the results are different with the numeric vector?

My email can be found via:

maintainer("DESeq2")

ADD COMMENT
0
Entering edit mode

hi Ryan,

Thanks for reporting this. I was able to reproduce the difference you see and was able to fix this in version 1.9.49 (which will be released in the next weeks with the new Bioconductor release). The fix is such that res1 now should give the same result as the res2 which you saw (the one with additional significant genes).

> res1 <- results(dds, contrast=c("nutrients","yes","no"))
> res2 <- results(dds, contrast=c(0,1,0,0))
> all.equal(res1$pvalue, res2$pvalue)
[1] TRUE

 

The issue was that a stabilization for standard errors which occurs when the log fold changes head to infinity was being skipped at a certain step, but for certain results tables and not others. This only affected the coefficients when the coefficient was heading to infinity, when there was no beta prior, and when the results were pulled by name or character vector rather than using the numeric contrast.

ADD REPLY
0
Entering edit mode

This makes perfect sense - the additional genes appear to be completely absent in one treatment and present in the other. Thank you so much for your quick solution! I'll switch over to the numeric contrasts until the next release. 

ADD REPLY

Login before adding your answer.

Traffic: 551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6