DESeq2: Difference in DEG numbers when switching reference condition (numerator, denominator) using contrast
1
0
Entering edit mode
Suvrat • 0
@d0b29f16
Last seen 3 months ago
Germany

Hello everyone,

I am comparing gene expression profiles between 8 different conditions (developmental stages in this case) with a design = ~ condition. I donot specifically have a "control" condition per se to set as a reference so I used the contrast= option when calling results(). I wanted to check whether switching the reference condition, i.e. switching the numerator and denominator terms in contrast=, gives similar results and used the code below to make the comparisons. Note: I did not do the pre-filtering step and only include the relevant lines of code for space reasons.

### I3 as denominator
ddsTxi.tr$condition <- relevel(ddsTxi.tr$condition, "I3")  
dds.tr <- DESeq(ddsTxi.tr)
res_contrast_I5_vs_I3 <- results(dds.tr, contrast = c("condition", "I5", "I3")  )
write.csv(as.data.frame(res_contrast_I5_vs_I3), file = "res_contrast_I5_vs_I3_padjOrdered_noleaf.csv")
res_contrast_I5_vs_I3 <- results(dds.tr, contrast = c("condition", "I5", "I3"), lfcThreshold = 1, alpha = 0.05)
summary(res_contrast_I5_vs_I3)
>out of 128220 with nonzero total read count
>adjusted p-value < 0.05
>LFC > 1.00 (up)    : 4285, 3.3%
>LFC < -1.00 (down) : 1118, 0.87%
>outliers [1]       : 793, 0.62%
>low counts [2]     : 24737, 19%
>(mean count < 4)
>[1] see 'cooksCutoff' argument of ?results
>[2] see 'independentFiltering' argument of ?results

### I5 as denominator
ddsTxi.tr$condition <- relevel(ddsTxi.tr$condition, "I5")
dds.tr <- DESeq(ddsTxi.tr)
res_contrast_I3_vs_I5 <- results(dds.tr, contrast = c("condition", "I3", "I5")  )
write.csv(as.data.frame(res_contrast_I3_vs_I5), file = "res_contrast_I3_vs_I5_padjOrdered_noleaf.csv")
res_contrast_I3_vs_I5 <- results(dds.tr, contrast = c("condition", "I3", "I5"), lfcThreshold = 1, alpha = 0.05)
summary(res_contrast_I3_vs_I5)
>out of 128220 with nonzero total read count
>adjusted p-value < 0.05
>LFC > 1.00 (up)    : 1169, 0.91%
>LFC < -1.00 (down) : 4229, 3.3%
>outliers [1]       : 793, 0.62%
>low counts [2]     : 24737, 19%
>(mean count < 4)
>[1] see 'cooksCutoff' argument of ?results
>[2] see 'independentFiltering' argument of ?results

As you can see the results are not exactly the same. I was wondering why this happens and searched for relevant posts in the discussion forums but did not find any posts which gave a definite answer. I extracted the gene Ids from both results to compare the overlap and there are DEGs exclusively being reported when one or the other condition is used as denominator (shown in venn diagram).
Overlap of DEG Ids from the results after removing outliers and low counts

I looked at the results tables for these 'exclusive genes' (in the .csv file written before filtering as they do not appear in the results after filtering) and found that their lfc and p-values also change (somtimes drastically) depending on the denominator being used. I expected that flipping reference levels should only change the sign of the lfc and not add/remove some genes entirely. Was my expectation wrong?

Why this difference in the DEGs when switching reference levels? Let me know if I have missed any relevant info that is needed Ps: I have the count data for these 'exclusive genes' and can upload it if necessary.

Best regards,

Suvrat


sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_India.utf8  LC_CTYPE=English_India.utf8    LC_MONETARY=English_India.utf8 LC_NUMERIC=C                  
[5] LC_TIME=English_India.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-3          pheatmap_1.0.12             ggplot2_3.4.4               dplyr_1.1.4                
 [5] jsonlite_1.8.7              tximport_1.30.0             readr_2.1.4                 DESeq2_1.42.0              
 [9] SummarizedExperiment_1.32.0 Biobase_2.62.0              MatrixGenerics_1.14.0       matrixStats_1.1.0          
[13] GenomicRanges_1.54.1        GenomeInfoDb_1.38.1         IRanges_2.36.0              S4Vectors_0.40.1           
[17] BiocGenerics_0.48.1        

loaded via a namespace (and not attached):
 [1] utf8_1.2.4              generics_0.1.3          SparseArray_1.2.2       bitops_1.0-7            lattice_0.22-5         
 [6] hms_1.1.3               magrittr_2.0.3          grid_4.3.2              Matrix_1.6-3            BiocManager_1.30.22    
[11] fansi_1.0.5             scales_1.3.0            codetools_0.2-19        abind_1.4-5             cli_3.6.1              
[16] rlang_1.1.2             crayon_1.5.2            XVector_0.42.0          munsell_0.5.0           withr_2.5.2            
[21] DelayedArray_0.28.0     S4Arrays_1.2.0          tools_4.3.2             parallel_4.3.2          tzdb_0.4.0             
[26] BiocParallel_1.36.0     colorspace_2.1-0        locfit_1.5-9.8          GenomeInfoDbData_1.2.11 vctrs_0.6.4            
[31] R6_2.5.1                lifecycle_1.0.4         zlibbioc_1.48.0         pkgconfig_2.0.3         pillar_1.9.0           
[36] gtable_0.3.4            glue_1.6.2              Rcpp_1.0.11             tibble_3.2.1            tidyselect_1.2.0       
[41] rstudioapi_0.15.0       compiler_4.3.2          RCurl_1.98-1.13
DESeq2 • 482 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 11 hours ago
United States

The reference level can modify the LFC/p-value when estimates are on the boundary of the parameter space, e.g. if the reference level has no counts then the LFC is undefined and LFC and SE head to infinity.

ADD COMMENT
0
Entering edit mode

Ah, I see. Thank you for the quick reply! The reference counts for these 'exclusive genes' are in fact dominated by zeroes. How would you suggest I continue? I am assuming I should choose one reference group and stick with it for all my comparisons. Also should I filter out these exclusive genes from my final list of DEGs? I am still learning to do this sort of analysis and unsure about the best way to do it.

ADD REPLY
0
Entering edit mode

Yes I recommend to stick to one reference group for the analysis.

You do not need to filter them. Sometimes users filter genes where all conditions have mostly zeros. But this is optional.

ADD REPLY

Login before adding your answer.

Traffic: 289 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6