Question: DESeq2 and edgeR getting very different results
0
gravatar for audrey.o.renson
23 months ago by
audrey.o.renson0 wrote:

Hello all,

I am conducting differential abundance analysis on some metagenomic data. In comparing the results of edgeR and DESeq2, I find that there are drastically more differentially abundant tags found by DESeq2 compared to edgeR (n=59 vs. 17, both FDR < 0.05). Further, only 10 out of 17 of edgeR's significant tags overlap with those found by DESeq2. Does anyone know why such a discrepancy would occur?

(Update 12.14.17 -- I am using the same filter for both DESeq2 and edgeR, and now there are 0 overlaps).

Below is the code I am using (updated 12.14.17, should be reproducible now). Thank you in advance for any help!

-Audrey

suppressPackageStartupMessages({
  library(curatedMetagenomicData)
  library(edgeR)
  library(DESeq2)  
})

#data setup
loman.counts <- curatedMetagenomicData("LomanNJ_2013.metaphlan_bugs_list.stool",
                                                               counts = TRUE,
                                                               dryrun = FALSE)[[1]]
loman.counts_na.omit <- loman.counts[,!is.na(loman.counts$stool_texture)]
loman.counts$stool_texture <- relevel(factor(loman.counts$stool_texture), "smooth")

###---DESeq2---###
dds <- DESeqDataSetFromMatrix(countData = exprs(loman.counts_na.omit),
                              colData   = pData(loman.counts_na.omit),
                              design= ~ stool_texture)
#filter out features with less than one count
dds <- dds[ rowSums(counts(dds)) > 1, ]
#fit the model
dds <- DESeq(dds, betaPrior = TRUE)
#filter by FDR < 0.05
results_DESeq2 <- subset( results(dds,  contrast = c("stool_texture", "watery","smooth"),
               alpha=0.05), padj < 0.05)

###---edgeR---###
dge <- DGEList(counts=exprs(loman.counts_na.omit ), group = loman.counts_na.omit$stool_texture)
#filter out features with less than 1 count
dge <- dge[ rowSums(dge$counts) > 1 , ]
#fit the model
dsg.mtrx <- model.matrix(~stool_texture, data=loman.counts)
fit <- glmLRT( glmFit( estimateDisp(dge, design = dsg.mtrx), dsg.mtrx), coef=3) 
#filter by FDR < 0.05
results_edgeR <- subset(topTags(fit, n=nrow(fit$table))$table, FDR < 0.05)

#how many overlap?
c(n_deseq2 = nrow(results_DESeq2), 
  n_edgeR  = nrow(results_edgeR),
  n_overlap = length(base::intersect(rownames(results_DESeq2), rownames(results_edgeR$table))))

 

Output of sessionInfo():

R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.18.1                SummarizedExperiment_1.8.0   DelayedArray_0.4.1          
 [4] matrixStats_0.52.2           GenomicRanges_1.30.0         GenomeInfoDb_1.14.0         
 [7] IRanges_2.12.0               S4Vectors_0.16.0             edgeR_3.20.1                
[10] limma_3.34.3                 curatedMetagenomicData_1.8.0 bindrcpp_0.2                
[13] ExperimentHub_1.4.0          AnnotationHub_2.10.1         Biobase_2.38.0              
[16] BiocGenerics_0.24.0          dplyr_0.7.4                 

loaded via a namespace (and not attached):
 [1] httr_1.3.1                    tidyr_0.7.2                   bit64_0.9-7                  
 [4] splines_3.4.1                 Formula_1.2-2                 shiny_1.0.5                  
 [7] assertthat_0.2.0              interactiveDisplayBase_1.16.0 latticeExtra_0.6-28          
[10] blob_1.1.0                    GenomeInfoDbData_0.99.1       yaml_2.1.16                  
[13] backports_1.1.1               RSQLite_2.0                   lattice_0.20-35              
[16] glue_1.2.0                    digest_0.6.12                 checkmate_1.8.5              
[19] RColorBrewer_1.1-2            XVector_0.18.0                colorspace_1.3-2             
[22] htmltools_0.3.6               httpuv_1.3.5                  Matrix_1.2-12                
[25] plyr_1.8.4                    XML_3.98-1.9                  pkgconfig_2.0.1              
[28] genefilter_1.60.0             zlibbioc_1.24.0               purrr_0.2.4                  
[31] xtable_1.8-2                  scales_0.5.0                  BiocParallel_1.12.0          
[34] annotate_1.56.1               htmlTable_1.11.0              tibble_1.3.4                 
[37] ggplot2_2.2.1                 nnet_7.3-12                   lazyeval_0.2.1               
[40] survival_2.41-3               magrittr_1.5                  mime_0.5                     
[43] memoise_1.1.0                 foreign_0.8-69                BiocInstaller_1.28.0         
[46] data.table_1.10.4-3           tools_3.4.1                   stringr_1.2.0                
[49] munsell_0.4.3                 locfit_1.5-9.1                cluster_2.0.6                
[52] AnnotationDbi_1.40.0          compiler_3.4.1                rlang_0.1.4                  
[55] grid_3.4.1                    RCurl_1.95-4.8                rstudioapi_0.7               
[58] htmlwidgets_0.9               bitops_1.0-6                  base64enc_0.1-3              
[61] gtable_0.2.0                  curl_3.1                      DBI_0.7                      
[64] R6_2.2.2                      gridExtra_2.3                 knitr_1.17                   
[67] bit_1.1-12                    bindr_0.1                     Hmisc_4.0-3                  
[70] stringi_1.1.6                 Rcpp_0.12.14                  geneplotter_1.56.0           
[73] rpart_4.1-11                  acepack_1.4.1                
ADD COMMENTlink modified 23 months ago by pablo_garcia0520 • written 23 months ago by audrey.o.renson0

Welcome to Bioconductor support, as I see this is your first post.

Please note that your code isn't reproducible. The code makes reference to quantities that haven't been defined, including 'counts' and 'lrt_watery', so attempts to run it will give errors.

Also, please give output from sessionInfo(), as asked for in the posting guide at http://www.bioconductor.org/help/support/posting-guide

ADD REPLYlink modified 23 months ago • written 23 months ago by Gordon Smyth39k

You haven't made the filters the same with your update. You've actually made them more different than in your original post, because you're still running DESeq with independent filtering turned on, while now not doing something equivalent for edgeR. Anyway, you should run each package as recommended by its authors, not try to make them artificially the same in an ad hoc manner, and filtering isn't the main issue here.

ADD REPLYlink modified 23 months ago • written 23 months ago by Gordon Smyth39k
Answer: DESeq2 and edgeR getting very different results
1
gravatar for pablo_garcia05
23 months ago by
pablo_garcia0520 wrote:

I recommend to you to take a view of that video (even the whole serie). Here you can see the difference between edgeR and DESeq2 in a simple summary form.

StatQuest: edgeR and DESeq2, part 2 - Independent Filtering

ADD COMMENTlink written 23 months ago by pablo_garcia0520
Answer: DESeq2 and edgeR getting very different results
0
gravatar for Michael Love
23 months ago by
Michael Love26k
United States
Michael Love26k wrote:

The methods are different, and so it's not expected that they would give the same result on a given dataset. Actually, 59 vs 17 is not very drastic in my opinion. If you look at the ranks by p-value, the methods are probably in rough agreement on the top genes.

If you want to try to make the methods more similar, you could run them on the same subset of genes (you could use the cpm filter and apply this to DESeq2 instead of the counts > 1 filter). Also, you can use independentFiltering=FALSE with DESeq2, so then you are performing BH adjustment over the same vector of p-values. Automatic independent filtering is one of the differences between methods.

ADD COMMENTlink written 23 months ago by Michael Love26k
Answer: DESeq2 and edgeR getting very different results
0
gravatar for Gordon Smyth
23 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Basically I agree with Michael here. Why would you expect different packages to give the same results? If they gave the same results, then there wouldn't be different packages in the first place.

There are some diffferences in your analyses. You scale normalized in DESeq2 but not in edgeR. The filtering is also different.

Even so, the difference in results doesn't seem that drastic from the limited information you give. The differences in the p-values could be small, but still lead to 59 vs 17 tags with FDR < 0.05.

Note that neither package was developed with metagenomic data in mind, so the data behaviour might be different to that of RNA-seq data. It's not at all apparent to me that either analysis you have done is appropriate for this data. The dispersion values for this data are huge and the groups separate poorly. Differences between edgeR and DESeq2 are (or should be) the least of your concerns.

ADD COMMENTlink modified 23 months ago • written 23 months ago by Gordon Smyth39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 401 users visited in the last hour