Question: DESeq2 and edgeR getting very different results
0
23 months ago by
audrey.o.renson0 wrote:

Hello all,

I am conducting differential abundance analysis on some metagenomic data. In comparing the results of edgeR and DESeq2, I find that there are drastically more differentially abundant tags found by DESeq2 compared to edgeR (n=59 vs. 17, both FDR < 0.05). Further, only 10 out of 17 of edgeR's significant tags overlap with those found by DESeq2. Does anyone know why such a discrepancy would occur?

(Update 12.14.17 -- I am using the same filter for both DESeq2 and edgeR, and now there are 0 overlaps).

Below is the code I am using (updated 12.14.17, should be reproducible now). Thank you in advance for any help!

-Audrey

suppressPackageStartupMessages({
library(curatedMetagenomicData)
library(edgeR)
library(DESeq2)
})

#data setup
loman.counts <- curatedMetagenomicData("LomanNJ_2013.metaphlan_bugs_list.stool",
counts = TRUE,
dryrun = FALSE)[[1]]
loman.counts_na.omit <- loman.counts[,!is.na(loman.counts$stool_texture)] loman.counts$stool_texture <- relevel(factor(loman.counts$stool_texture), "smooth") ###---DESeq2---### dds <- DESeqDataSetFromMatrix(countData = exprs(loman.counts_na.omit), colData = pData(loman.counts_na.omit), design= ~ stool_texture) #filter out features with less than one count dds <- dds[ rowSums(counts(dds)) > 1, ] #fit the model dds <- DESeq(dds, betaPrior = TRUE) #filter by FDR < 0.05 results_DESeq2 <- subset( results(dds, contrast = c("stool_texture", "watery","smooth"), alpha=0.05), padj < 0.05) ###---edgeR---### dge <- DGEList(counts=exprs(loman.counts_na.omit ), group = loman.counts_na.omit$stool_texture)
#filter out features with less than 1 count
dge <- dge[ rowSums(dge$counts) > 1 , ] #fit the model dsg.mtrx <- model.matrix(~stool_texture, data=loman.counts) fit <- glmLRT( glmFit( estimateDisp(dge, design = dsg.mtrx), dsg.mtrx), coef=3) #filter by FDR < 0.05 results_edgeR <- subset(topTags(fit, n=nrow(fit$table))$table, FDR < 0.05) #how many overlap? c(n_deseq2 = nrow(results_DESeq2), n_edgeR = nrow(results_edgeR), n_overlap = length(base::intersect(rownames(results_DESeq2), rownames(results_edgeR$table))))

Output of sessionInfo():

R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] DESeq2_1.18.1                SummarizedExperiment_1.8.0   DelayedArray_0.4.1
[4] matrixStats_0.52.2           GenomicRanges_1.30.0         GenomeInfoDb_1.14.0
[7] IRanges_2.12.0               S4Vectors_0.16.0             edgeR_3.20.1
[10] limma_3.34.3                 curatedMetagenomicData_1.8.0 bindrcpp_0.2
[13] ExperimentHub_1.4.0          AnnotationHub_2.10.1         Biobase_2.38.0
[16] BiocGenerics_0.24.0          dplyr_0.7.4

loaded via a namespace (and not attached):
[1] httr_1.3.1                    tidyr_0.7.2                   bit64_0.9-7
[4] splines_3.4.1                 Formula_1.2-2                 shiny_1.0.5
[7] assertthat_0.2.0              interactiveDisplayBase_1.16.0 latticeExtra_0.6-28
[10] blob_1.1.0                    GenomeInfoDbData_0.99.1       yaml_2.1.16
[13] backports_1.1.1               RSQLite_2.0                   lattice_0.20-35
[16] glue_1.2.0                    digest_0.6.12                 checkmate_1.8.5
[19] RColorBrewer_1.1-2            XVector_0.18.0                colorspace_1.3-2
[22] htmltools_0.3.6               httpuv_1.3.5                  Matrix_1.2-12
[25] plyr_1.8.4                    XML_3.98-1.9                  pkgconfig_2.0.1
[28] genefilter_1.60.0             zlibbioc_1.24.0               purrr_0.2.4
[31] xtable_1.8-2                  scales_0.5.0                  BiocParallel_1.12.0
[34] annotate_1.56.1               htmlTable_1.11.0              tibble_1.3.4
[37] ggplot2_2.2.1                 nnet_7.3-12                   lazyeval_0.2.1
[40] survival_2.41-3               magrittr_1.5                  mime_0.5
[43] memoise_1.1.0                 foreign_0.8-69                BiocInstaller_1.28.0
[46] data.table_1.10.4-3           tools_3.4.1                   stringr_1.2.0
[49] munsell_0.4.3                 locfit_1.5-9.1                cluster_2.0.6
[52] AnnotationDbi_1.40.0          compiler_3.4.1                rlang_0.1.4
[55] grid_3.4.1                    RCurl_1.95-4.8                rstudioapi_0.7
[58] htmlwidgets_0.9               bitops_1.0-6                  base64enc_0.1-3
[61] gtable_0.2.0                  curl_3.1                      DBI_0.7
[64] R6_2.2.2                      gridExtra_2.3                 knitr_1.17
[67] bit_1.1-12                    bindr_0.1                     Hmisc_4.0-3
[70] stringi_1.1.6                 Rcpp_0.12.14                  geneplotter_1.56.0
[73] rpart_4.1-11                  acepack_1.4.1                
modified 23 months ago by pablo_garcia0520 • written 23 months ago by audrey.o.renson0

Welcome to Bioconductor support, as I see this is your first post.

Please note that your code isn't reproducible. The code makes reference to quantities that haven't been defined, including 'counts' and 'lrt_watery', so attempts to run it will give errors.

Also, please give output from sessionInfo(), as asked for in the posting guide at http://www.bioconductor.org/help/support/posting-guide

You haven't made the filters the same with your update. You've actually made them more different than in your original post, because you're still running DESeq with independent filtering turned on, while now not doing something equivalent for edgeR. Anyway, you should run each package as recommended by its authors, not try to make them artificially the same in an ad hoc manner, and filtering isn't the main issue here.

Answer: DESeq2 and edgeR getting very different results
1
23 months ago by
pablo_garcia0520 wrote:

I recommend to you to take a view of that video (even the whole serie). Here you can see the difference between edgeR and DESeq2 in a simple summary form.

StatQuest: edgeR and DESeq2, part 2 - Independent Filtering

Answer: DESeq2 and edgeR getting very different results
0
23 months ago by
Michael Love26k
United States
Michael Love26k wrote:

The methods are different, and so it's not expected that they would give the same result on a given dataset. Actually, 59 vs 17 is not very drastic in my opinion. If you look at the ranks by p-value, the methods are probably in rough agreement on the top genes.

If you want to try to make the methods more similar, you could run them on the same subset of genes (you could use the cpm filter and apply this to DESeq2 instead of the counts > 1 filter). Also, you can use independentFiltering=FALSE with DESeq2, so then you are performing BH adjustment over the same vector of p-values. Automatic independent filtering is one of the differences between methods.

Answer: DESeq2 and edgeR getting very different results
0
23 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Basically I agree with Michael here. Why would you expect different packages to give the same results? If they gave the same results, then there wouldn't be different packages in the first place.

There are some diffferences in your analyses. You scale normalized in DESeq2 but not in edgeR. The filtering is also different.

Even so, the difference in results doesn't seem that drastic from the limited information you give. The differences in the p-values could be small, but still lead to 59 vs 17 tags with FDR < 0.05.

Note that neither package was developed with metagenomic data in mind, so the data behaviour might be different to that of RNA-seq data. It's not at all apparent to me that either analysis you have done is appropriate for this data. The dispersion values for this data are huge and the groups separate poorly. Differences between edgeR and DESeq2 are (or should be) the least of your concerns.