Question

Diffbind: different results on the same sample depending on contrast parameters

0

Entering edit mode

Luca • 0

@79698011

Last seen 5 months ago

Canada

Hello,

I'm running a differential expression profile for a protein using diffbind. I'm curious about one thing, it's not really a problem per se, but the documentation is unclear about a specific part of the dba.contrast() function.

In particular, when I run a default contrast, simply the count-matrix dba object subsequently analyze it, I get one set of results, however when I build a custom contrast and the analyze it under the same parameters, I get a different set of results:

library(DiffBind)
library(tidyverse)
library(rtracklayer)
library(dplyr)
library(stringr)
library(BiocParallel)
library(GenomicAlignments)

#Default Contrast
Contrast_Test <- dba.contrast(Summits_500[[1]])

Analyze_Test <- dba.analyze(Contrast_Test, bGreylist = T, method = DBA_DESEQ2)

The result of this being

#Result
6 Samples, 10504 sites in matrix:
    ID      Tissue Factor Condition Replicate    Reads FRiP
1   71 Hippocampal Female        WT         1 38836922 0.10
2 8070 Hippocampal Female        WT         3 21416526 0.07
3 8069 Hippocampal Female        WT         2 10508228 0.03
4 9017 Hippocampal   Male        WT         1 34362930 0.04
5 9018 Hippocampal   Male        WT         2 17379465 0.05
6 9019 Hippocampal   Male        WT         3 23407206 0.07

Design: [~Factor] | 1 Contrast:
  Factor Group Samples Group2 Samples2 DB.DESeq2
1 Factor  Male       3 Female        3         1

Running the Custom contrast

#Custom Contrast
Contrast_Test <- dba.contrast(Summits_500[[1]], group1 = Summits_500[[1]]$masks$Male,
                              group2 = Summits_500[[1]]$masks$Female, name1 = "Male", name2="Female",
                              categories = c(DBA_FACTOR))

Analyze_Test <- dba.analyze(Contrast_Test, bGreylist = T, method = DBA_DESEQ2)

And the result of this being

#Result
6 Samples, 10504 sites in matrix:
    ID      Tissue Factor Condition Replicate    Reads FRiP
1   71 Hippocampal Female        WT         1 38836922 0.10
2 8070 Hippocampal Female        WT         3 21416526 0.07
3 8069 Hippocampal Female        WT         2 10508228 0.03
4 9017 Hippocampal   Male        WT         1 34362930 0.04
5 9018 Hippocampal   Male        WT         2 17379465 0.05
6 9019 Hippocampal   Male        WT         3 23407206 0.07

1 Contrast:
  Group Samples Group2 Samples2 DB.DESeq2
1  Male       3 Female        3         1

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.1252    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicAlignments_1.30.0    Rsamtools_2.10.0            Biostrings_2.62.0           XVector_0.34.0              BiocParallel_1.28.3        
 [6] rtracklayer_1.54.0          forcats_0.5.1               stringr_1.4.0               dplyr_1.0.8                 purrr_0.3.4                
[11] readr_2.1.2                 tidyr_1.2.0                 tibble_3.1.6                ggplot2_3.3.6               tidyverse_1.3.1            
[16] DiffBind_3.4.11             SummarizedExperiment_1.24.0 Biobase_2.54.0              MatrixGenerics_1.6.0        matrixStats_0.62.0         
[21] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1         IRanges_2.28.0              S4Vectors_0.32.3            BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
  [1] readxl_1.4.0           backports_1.4.1        plyr_1.8.7             splines_4.1.2          amap_0.8-18            digest_0.6.29         
  [7] invgamma_1.1           htmltools_0.5.2        SQUAREM_2021.1         fansi_1.0.2            magrittr_2.0.2         memoise_2.0.1         
 [13] BSgenome_1.62.0        ROCR_1.0-11            tzdb_0.3.0             limma_3.50.3           annotate_1.72.0        modelr_0.1.8          
 [19] systemPipeR_2.0.8      bdsmatrix_1.3-4        jpeg_0.1-9             colorspace_2.0-3       rvest_1.0.2            blob_1.2.3            
 [25] apeglm_1.16.0          ggrepel_0.9.1          haven_2.5.0            xfun_0.30              crayon_1.5.1           RCurl_1.98-1.6        
 [31] jsonlite_1.8.0         genefilter_1.76.0      survival_3.2-11        glue_1.6.1             gtable_0.3.0           zlibbioc_1.40.0       
 [37] DelayedArray_0.20.0    scales_1.2.0           mvtnorm_1.1-3          DBI_1.1.2              Rcpp_1.0.8             xtable_1.8-4          
 [43] emdbook_1.3.12         bit_4.0.4              truncnorm_1.0-8        htmlwidgets_1.5.4      httr_1.4.2             gplots_3.1.3          
 [49] RColorBrewer_1.1-3     ellipsis_0.3.2         farver_2.1.0           pkgconfig_2.0.3        XML_3.99-0.9           dbplyr_2.1.1          
 [55] locfit_1.5-9.5         utf8_1.2.2             labeling_0.4.2         tidyselect_1.1.2       rlang_1.0.1            AnnotationDbi_1.56.2  
 [61] cellranger_1.1.0       munsell_0.5.0          tools_4.1.2            cachem_1.0.6           cli_3.1.1              generics_0.1.2        
 [67] RSQLite_2.2.13         broom_0.8.0            fastmap_1.1.0          yaml_2.3.5             fs_1.5.2               bit64_4.0.5           
 [73] caTools_1.18.2         RANN_2.6.1             KEGGREST_1.34.0        RcppRoll_0.3.0         xml2_1.3.3             compiler_4.1.2        
 [79] rstudioapi_0.13        png_0.1-7              reprex_2.0.1           geneplotter_1.72.0     stringi_1.7.6          lattice_0.20-41       
 [85] Matrix_1.3-2           vctrs_0.3.8            pillar_1.7.0           lifecycle_1.0.1        data.table_1.14.2      bitops_1.0-7          
 [91] irlba_2.3.5            patchwork_1.1.1        R6_2.5.1               BiocIO_1.4.0           latticeExtra_0.6-29    hwriter_1.3.2.1       
 [97] ShortRead_1.52.0       KernSmooth_2.23-20     MASS_7.3-54            gtools_3.9.2           assertthat_0.2.1       DESeq2_1.34.0         
[103] rjson_0.2.21           withr_2.5.0            GenomeInfoDbData_1.2.7 parallel_4.1.2         hms_1.1.1              grid_4.1.2            
[109] coda_0.19-4            GreyListChIP_1.26.0    ashr_2.2-54            mixsqp_0.3-43          bbmle_1.0.24           lubridate_1.8.0       
[115] numDeriv_2016.8-1.1    tinytex_0.38           restfulr_0.0.13

While resulting tables numerically appear the same, the expression plots are very different (in particular when calling the volcano or ma plots). I'm curious as to why that is. What exactly does the custom contrast do to my data. The reason I used a custom contrast with the grouping is that I wanted to exclude a sample from one of the comparisons, which would bring my replicates down to 2 in the female category.

DifferentialExpression DiffBind • 1.1k views

ADD COMMENT • link updated 3.6 years ago by Rory Stark ★ 5.2k • written 3.7 years ago by Luca • 0

score 0 · Answer 1 · 2022-06-08

DiffBind_3.0 introduced a different way of handling models and contrasts, enabling an explicit design formula and updating how the underlying analysis packages (DESseq2 and edgeR) are used. When the group1 and group2 parameters are specified, the package reverts to the previous methods. The emphasis for this is on backward compatibility, so that the results would be the same as if you used an older (pre-version 3) versions of DiffBind, so the results are not identical.

There are a number of differences between how the analysis are done in the two versions. Sections 10.3 and 11.1 of the DiffBind vignette have the details. The biggest difference that would impact MA and volcano plots is that the new version takes advantage of the underlying packages to compute fold changes, which are shrunk, while the older versions does a more simple fold change calculation.

If you want to exclude a sample from the analysis, and want to take advantage of the up-to-date methods, you can exclude the sample completely by removing it from the analysis completely. For example, if you wanted to exclude the first sample:

removed <- dba(Summits_500[[1]], mask=2:6)

Alternatively, you could keep the sample's reads in the model, but exclude it from the contrast, by adding manipulating the metadata factors.