Hello,
I'm running a differential expression profile for a protein using diffbind. I'm curious about one thing, it's not really a problem per se, but the documentation is unclear about a specific part of the dba.contrast() function.
In particular, when I run a default contrast, simply the count-matrix dba object subsequently analyze it, I get one set of results, however when I build a custom contrast and the analyze it under the same parameters, I get a different set of results:
library(DiffBind)
library(tidyverse)
library(rtracklayer)
library(dplyr)
library(stringr)
library(BiocParallel)
library(GenomicAlignments)
#Default Contrast
Contrast_Test <- dba.contrast(Summits_500[[1]])
Analyze_Test <- dba.analyze(Contrast_Test, bGreylist = T, method = DBA_DESEQ2)
The result of this being
#Result
6 Samples, 10504 sites in matrix:
ID Tissue Factor Condition Replicate Reads FRiP
1 71 Hippocampal Female WT 1 38836922 0.10
2 8070 Hippocampal Female WT 3 21416526 0.07
3 8069 Hippocampal Female WT 2 10508228 0.03
4 9017 Hippocampal Male WT 1 34362930 0.04
5 9018 Hippocampal Male WT 2 17379465 0.05
6 9019 Hippocampal Male WT 3 23407206 0.07
Design: [~Factor] | 1 Contrast:
Factor Group Samples Group2 Samples2 DB.DESeq2
1 Factor Male 3 Female 3 1
Running the Custom contrast
#Custom Contrast
Contrast_Test <- dba.contrast(Summits_500[[1]], group1 = Summits_500[[1]]$masks$Male,
group2 = Summits_500[[1]]$masks$Female, name1 = "Male", name2="Female",
categories = c(DBA_FACTOR))
Analyze_Test <- dba.analyze(Contrast_Test, bGreylist = T, method = DBA_DESEQ2)
And the result of this being
#Result
6 Samples, 10504 sites in matrix:
ID Tissue Factor Condition Replicate Reads FRiP
1 71 Hippocampal Female WT 1 38836922 0.10
2 8070 Hippocampal Female WT 3 21416526 0.07
3 8069 Hippocampal Female WT 2 10508228 0.03
4 9017 Hippocampal Male WT 1 34362930 0.04
5 9018 Hippocampal Male WT 2 17379465 0.05
6 9019 Hippocampal Male WT 3 23407206 0.07
1 Contrast:
Group Samples Group2 Samples2 DB.DESeq2
1 Male 3 Female 3 1
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicAlignments_1.30.0 Rsamtools_2.10.0 Biostrings_2.62.0 XVector_0.34.0 BiocParallel_1.28.3
[6] rtracklayer_1.54.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.8 purrr_0.3.4
[11] readr_2.1.2 tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.6 tidyverse_1.3.1
[16] DiffBind_3.4.11 SummarizedExperiment_1.24.0 Biobase_2.54.0 MatrixGenerics_1.6.0 matrixStats_0.62.0
[21] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 IRanges_2.28.0 S4Vectors_0.32.3 BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] readxl_1.4.0 backports_1.4.1 plyr_1.8.7 splines_4.1.2 amap_0.8-18 digest_0.6.29
[7] invgamma_1.1 htmltools_0.5.2 SQUAREM_2021.1 fansi_1.0.2 magrittr_2.0.2 memoise_2.0.1
[13] BSgenome_1.62.0 ROCR_1.0-11 tzdb_0.3.0 limma_3.50.3 annotate_1.72.0 modelr_0.1.8
[19] systemPipeR_2.0.8 bdsmatrix_1.3-4 jpeg_0.1-9 colorspace_2.0-3 rvest_1.0.2 blob_1.2.3
[25] apeglm_1.16.0 ggrepel_0.9.1 haven_2.5.0 xfun_0.30 crayon_1.5.1 RCurl_1.98-1.6
[31] jsonlite_1.8.0 genefilter_1.76.0 survival_3.2-11 glue_1.6.1 gtable_0.3.0 zlibbioc_1.40.0
[37] DelayedArray_0.20.0 scales_1.2.0 mvtnorm_1.1-3 DBI_1.1.2 Rcpp_1.0.8 xtable_1.8-4
[43] emdbook_1.3.12 bit_4.0.4 truncnorm_1.0-8 htmlwidgets_1.5.4 httr_1.4.2 gplots_3.1.3
[49] RColorBrewer_1.1-3 ellipsis_0.3.2 farver_2.1.0 pkgconfig_2.0.3 XML_3.99-0.9 dbplyr_2.1.1
[55] locfit_1.5-9.5 utf8_1.2.2 labeling_0.4.2 tidyselect_1.1.2 rlang_1.0.1 AnnotationDbi_1.56.2
[61] cellranger_1.1.0 munsell_0.5.0 tools_4.1.2 cachem_1.0.6 cli_3.1.1 generics_0.1.2
[67] RSQLite_2.2.13 broom_0.8.0 fastmap_1.1.0 yaml_2.3.5 fs_1.5.2 bit64_4.0.5
[73] caTools_1.18.2 RANN_2.6.1 KEGGREST_1.34.0 RcppRoll_0.3.0 xml2_1.3.3 compiler_4.1.2
[79] rstudioapi_0.13 png_0.1-7 reprex_2.0.1 geneplotter_1.72.0 stringi_1.7.6 lattice_0.20-41
[85] Matrix_1.3-2 vctrs_0.3.8 pillar_1.7.0 lifecycle_1.0.1 data.table_1.14.2 bitops_1.0-7
[91] irlba_2.3.5 patchwork_1.1.1 R6_2.5.1 BiocIO_1.4.0 latticeExtra_0.6-29 hwriter_1.3.2.1
[97] ShortRead_1.52.0 KernSmooth_2.23-20 MASS_7.3-54 gtools_3.9.2 assertthat_0.2.1 DESeq2_1.34.0
[103] rjson_0.2.21 withr_2.5.0 GenomeInfoDbData_1.2.7 parallel_4.1.2 hms_1.1.1 grid_4.1.2
[109] coda_0.19-4 GreyListChIP_1.26.0 ashr_2.2-54 mixsqp_0.3-43 bbmle_1.0.24 lubridate_1.8.0
[115] numDeriv_2016.8-1.1 tinytex_0.38 restfulr_0.0.13
While resulting tables numerically appear the same, the expression plots are very different (in particular when calling the volcano or ma plots). I'm curious as to why that is. What exactly does the custom contrast do to my data. The reason I used a custom contrast with the grouping is that I wanted to exclude a sample from one of the comparisons, which would bring my replicates down to 2 in the female category.