DESeq2 used with zinbwave weights - Differences on contrasts calculations
2
0
Entering edit mode
Aurélie • 0
@13de9a50
Last seen 26 days ago
France

Hi,

I have been using DESeq2 together with weights obtained from the zinbwave package for shotgun data analysis (as proposed here in the case of single-cell applications: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1406-4).

In DESeq2, we can specify contrasts among other things either by a character vector (e.g., contrast = c("condition","B","A")) or by a numeric vector with one element for each element in resultsNames(object). If my understanding is correct, in the first case, the contrast is computed using the already computed results, and in the second case, some elements are recomputed.

My issue is that I am getting different results whether I use the character vector or the numeric vector, even if specifying the exact same contrast. And I only have this issue when I use zinbwave weights. So, I suspect that weights are handled differently in the main DESeq2 analysis, and in the recomputation performed when we specify contrasts as a numeric vector. However, I do not know which one is the correct result.

Below is a very simple example using a phylum-level dataset from the curatedMetagenomicData R package (phyloseq object can be found here as I changed a few things in the metadata: Example data set).

I obtain a few major differences on lfcSE and thus on p-values. Did anyone encounter this issue or have an idea on which contrast version would provide the correct results? I am using R 4.4.0 and not the newest DESeq2 version (DESeq2_1.44.0).

# Import libraries
library(phyloseq)
library(dplyr)
library(DESeq2)
library(zinbwave)

# Keep only TF samples and exclude null taxa
data <- subset_samples(phseq_bengtsson_phy, time=="TF")
data <- subset_taxa(data, rowSums(otu_table(data))>0)

# DESeq2 - zinbwave run
## Create DESeq object with weights
dds <- phyloseq_to_deseq2(data, design = ~ 1)
zinb <- zinbwave(dds, K = 0, observationalWeights = TRUE,
                           BPPARAM = BiocParallel::SerialParam(), epsilon = 1e12)
dds_zinbwave <- DESeqDataSet(zinb, design = ~travel_destination)

## Perform DESeq2 analysis
mod <- DESeq(dds_zinbwave, test="Wald", sfType="poscounts")
resultsNames(mod)
# [1] "Intercept"        "travel_destination_Indian_Peninsula_vs_Central_Africa"

## Using the contrast option as a character vector
res_zinb_1 <- data.frame(results(mod, contrast=c("travel_destination","Indian_Peninsula","Central_Africa"), alpha=0.05))
## Using the contrast option as a numeric vector
contraster <- c(0,1)
names(contraster) <- resultsNames(mod)
contraster
# Intercept travel_destination_Indian_Peninsula_vs_Central_Africa 
#         0                                                     1 
res_zinb_2 <- data.frame(DESeq2::results(mod, contrast = contraster, alpha = 0.05))
## Comparison
res_zinb_1-res_zinb_2
#                            baseMean log2FoldChange         lfcSE          stat        pvalue          padj
# Firmicutes                        0   0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  4.828636e-02
# Bacteroidota                      0   0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00 -1.605324e-02
# Verrucomicrobia                   0   0.000000e+00  2.220446e-16 -4.440892e-16  2.081668e-17  6.938894e-17
# Lentisphaerae                     0  -4.440892e-16  0.000000e+00 -8.881784e-16 -1.084202e-19 -5.421011e-19
# Proteobacteria                    0   0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
# Candidatus Melainabacteria        0   0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00            NA
# Actinobacteria                    0   0.000000e+00 -4.851477e+01 -3.347592e-01 -2.620144e-01 -1.803764e-01
# Euryarchaeota                     0   1.110223e-16  4.815127e-01 -7.162905e-01  2.601332e-01  3.117906e-01
# Synergistetes                     0   0.000000e+00 -1.235062e-01 -2.520104e-01 -5.553688e-02 -6.484741e-02
# Spirochaetes                      0   0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00            NA
# Ascomycota                        0   0.000000e+00  6.351663e-01  9.365578e+00  8.224797e-64  8.224797e-63
# Fusobacteria                      0   0.000000e+00 -1.061564e+00  1.622202e-01 -1.244158e-01 -1.803764e-01

sessionInfo( )
# R version 4.4.0 (2024-04-24 ucrt)
# Platform: x86_64-w64-mingw32/x64
# Running under: Windows 11 x64 (build 22631)
# 
# Matrix products: default
# 
# 
# locale:
# [1] LC_COLLATE=French_France.utf8  LC_CTYPE=French_France.utf8    LC_MONETARY=French_France.utf8 LC_NUMERIC=C                   LC_TIME=French_France.utf8    
# 
# time zone: Europe/Paris
# tzcode source: internal
# 
# attached base packages:
# [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#  [1] zinbwave_1.26.0             SingleCellExperiment_1.26.0 DESeq2_1.44.0               SummarizedExperiment_1.34.0 Biobase_2.64.0              MatrixGenerics_1.16.0      
#  [7] matrixStats_1.3.0           GenomicRanges_1.56.1        GenomeInfoDb_1.40.1         IRanges_2.38.1              S4Vectors_0.42.1            BiocGenerics_0.50.0        
# [13] dplyr_1.1.4                 phyloseq_1.48.0            
# 
# loaded via a namespace (and not attached):
#  [1] DBI_1.2.3               permute_0.9-7           rlang_1.1.4             magrittr_2.0.3          ade4_1.7-22             compiler_4.4.0          RSQLite_2.3.7          
#  [8] mgcv_1.9-1              png_0.1-8               vctrs_0.6.5             reshape2_1.4.4          stringr_1.5.1           pkgconfig_2.0.3         crayon_1.5.3           
# [15] fastmap_1.2.0           XVector_0.44.0          utf8_1.2.4              UCSC.utils_1.0.0        purrr_1.0.2             bit_4.0.5               zlibbioc_1.50.0        
# [22] cachem_1.1.0            jsonlite_1.8.8          biomformat_1.32.0       blob_1.2.4              rhdf5filters_1.16.0     DelayedArray_0.30.1     Rhdf5lib_1.26.0        
# [29] BiocParallel_1.38.0     parallel_4.4.0          cluster_2.1.6           R6_2.5.1                stringi_1.8.4           limma_3.60.4            genefilter_1.86.0      
# [36] Rcpp_1.0.13             iterators_1.0.14        Matrix_1.7-0            splines_4.4.0           igraph_2.0.3            tidyselect_1.2.1        rstudioapi_0.16.0      
# [43] abind_1.4-5             vegan_2.6-8             doParallel_1.0.17       codetools_0.2-20        lattice_0.22-6          tibble_3.2.1            plyr_1.8.9             
# [50] KEGGREST_1.44.1         survival_3.7-0          Biostrings_2.72.1       pillar_1.9.0            foreach_1.5.2           softImpute_1.4-1        generics_0.1.3         
# [57] ggplot2_3.5.1           munsell_0.5.1           scales_1.3.0            xtable_1.8-4            glue_1.7.0              tools_4.4.0             data.table_1.16.0      
# [64] annotate_1.82.0         locfit_1.5-9.10         XML_3.99-0.17           rhdf5_2.48.0            grid_4.4.0              tidyr_1.3.1             ape_5.8                
# [71] AnnotationDbi_1.66.0    edgeR_4.2.1             colorspace_2.1-1        nlme_3.1-166            GenomeInfoDbData_1.2.12 cli_3.6.3               fansi_1.0.6            
# [78] S4Arrays_1.4.1          gtable_0.3.5            digest_0.6.37           SparseArray_1.4.8       memoise_2.0.1           multtest_2.60.0         lifecycle_1.0.4        
# [85] httr_1.4.7              statmod_1.5.0           bit64_4.0.5             MASS_7.3-61
zinbwave DESeq2 • 748 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 23 hours ago
United States

There is one case where it is different in the code (it's here in the forum but probably buried under thousands of posts). If you have a multi-group experiment, and there are all 0's in two groups being compared, the character-style contrast will zero out the LFC (set equal to zero), whereas it can be not identical to zero if you use numeric contrast (due to how the IRLS converges, and when groups have all 0 for all samples, and you compare those two group coefficients).

ADD COMMENT
0
Entering edit mode

Thank you Michael Love for your answer. I am not sure this is the cause for what I see here. In this example, I am only comparing 2 groups, and I have filtered out variables which were null for all samples. The biggest differences are observed on lfcSE rather than fold changes (e.g. Actinobacteria). This must be related somhow to the use of weights because I do not have these differences when using DESeq2 without weights.

ADD REPLY
0
Entering edit mode

I'm not sure, because I can see in results() that it is again using the same weights matrix when it re-computes a contrast.

Do you say that the lfcSE are mostly the same? How many are basically the same and how many are different?

ADD REPLY
0
Entering edit mode

Looking at the above, check if the discrepant cases in terms of lfcSE have LFC=0 when using the character contrast. This is expected behavior. And it can happen here I think if you have zeros for all the samples in a group excluding those samples with very low weight (essentially then you have all zeros for the observed data, positive for data that is downweighted to no effect on inference).

ADD REPLY
0
Entering edit mode

I have discrepancies of lfcse on 5 phyla out of 12. None of them had a logFC at 0. I tried to focus on 2 examples to better understand:

Fusobacteria

  1. 74% of zero values
  2. With DESeq2 alone (no weights provided), I get the same results with both character and numeric contrasts

Results:

> #              baseMean log2FoldChange    lfcSE      stat    pvalue      padj
> # Fusobacteria 674.3513      -4.134104 2.887016 -1.431964 0.1521541 0.334739
  1. When I add the zinbwave weights, I get different results due to the weights and I also have discrepencies between the 2 contrasts versions for lfcSE.

Character contrast:

> #              baseMean log2FoldChange    lfcSE      stat    pvalue      padj
> # Fusobacteria 1336.426      0.4611203 1.285612 0.3586776 0.7198363 0.8163801

Numeric contrast:

> #              baseMean log2FoldChange    lfcSE      stat    pvalue      padj
> # Fusobacteria 1336.426      0.4611203 2.347176 0.1964575 0.8442521 0.9967565

Actinobacteria

  1. All values are non-zero, and consistently all the weights defined by zinbwave are at 1
  2. With DESeq2 alone (no weights provided), I get the same results with both character and numeric contrasts

Results:

> #                baseMean log2FoldChange    lfcSE      stat    pvalue      padj
> # Actinobacteria  1652439     -0.1996161 0.584243 -0.3416662 0.7326021 0.8157417
  1. When I add the zinbwave weights (all at one for this specific phylum), I get the same log2FoldChange but highly different lfcSE values with the numeric contrast.

Character contrast:

> #                baseMean log2FoldChange    lfcSE      stat    pvalue      padj
> # Actinobacteria  1652439      -0.199616 0.5891432 -0.3388243 0.7347421 0.8163801

Numeric contrast:

> #                 baseMean log2FoldChange    lfcSE      stat    pvalue      padj
> # Actinobacteria   1652439      -0.199616 49.10391 -0.004065176 0.9967565 0.9967565

I have no idea why I got different results on this one.

ADD REPLY
0
Entering edit mode

I'll take a look.

ADD REPLY
1
Entering edit mode

Thanks for pointing this out! I found a bug that the wrong weights matrix was being used when recomputing the SE within results() for the numeric-style contrast. I'll fix this in devel / github and push a fix to the release branch as well.

ADD REPLY
1
Entering edit mode

I pushed 1.48.2 to release, or you can get the latest with remotes::install_github("thelovelab/DESeq2")

ADD REPLY
0
Entering edit mode

Thank you so much!

ADD REPLY
0
Entering edit mode

Is it the case that you are running DESeq2 across 12 total features? This is a larger issue: I wouldn't use an NB model on such counts. The size factor estimation is surely not appropriately dealing with the compositional nature of the data.

ADD REPLY
0
Entering edit mode

Yes, I know. This was just to get a simple example ;-).

ADD REPLY
1
Entering edit mode

Ok good! Thanks for the example and helping me hunt down the bug.

ADD REPLY

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6