DESeq2 and edgeR getting very different results
3
0
Entering edit mode
@audreyorenson-14633
Last seen 6.4 years ago

Hello all,

I am conducting differential abundance analysis on some metagenomic data. In comparing the results of edgeR and DESeq2, I find that there are drastically more differentially abundant tags found by DESeq2 compared to edgeR (n=59 vs. 17, both FDR < 0.05). Further, only 10 out of 17 of edgeR's significant tags overlap with those found by DESeq2. Does anyone know why such a discrepancy would occur?

(Update 12.14.17 -- I am using the same filter for both DESeq2 and edgeR, and now there are 0 overlaps).

Below is the code I am using (updated 12.14.17, should be reproducible now). Thank you in advance for any help!

-Audrey

suppressPackageStartupMessages({
  library(curatedMetagenomicData)
  library(edgeR)
  library(DESeq2)  
})

#data setup
loman.counts <- curatedMetagenomicData("LomanNJ_2013.metaphlan_bugs_list.stool",
                                                               counts = TRUE,
                                                               dryrun = FALSE)[[1]]
loman.counts_na.omit <- loman.counts[,!is.na(loman.counts$stool_texture)]
loman.counts$stool_texture <- relevel(factor(loman.counts$stool_texture), "smooth")

###---DESeq2---###
dds <- DESeqDataSetFromMatrix(countData = exprs(loman.counts_na.omit),
                              colData   = pData(loman.counts_na.omit),
                              design= ~ stool_texture)
#filter out features with less than one count
dds <- dds[ rowSums(counts(dds)) > 1, ]
#fit the model
dds <- DESeq(dds, betaPrior = TRUE)
#filter by FDR < 0.05
results_DESeq2 <- subset( results(dds,  contrast = c("stool_texture", "watery","smooth"),
               alpha=0.05), padj < 0.05)

###---edgeR---###
dge <- DGEList(counts=exprs(loman.counts_na.omit ), group = loman.counts_na.omit$stool_texture)
#filter out features with less than 1 count
dge <- dge[ rowSums(dge$counts) > 1 , ]
#fit the model
dsg.mtrx <- model.matrix(~stool_texture, data=loman.counts)
fit <- glmLRT( glmFit( estimateDisp(dge, design = dsg.mtrx), dsg.mtrx), coef=3) 
#filter by FDR < 0.05
results_edgeR <- subset(topTags(fit, n=nrow(fit$table))$table, FDR < 0.05)

#how many overlap?
c(n_deseq2 = nrow(results_DESeq2), 
  n_edgeR  = nrow(results_edgeR),
  n_overlap = length(base::intersect(rownames(results_DESeq2), rownames(results_edgeR$table))))

 

Output of sessionInfo():

R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.18.1                SummarizedExperiment_1.8.0   DelayedArray_0.4.1          
 [4] matrixStats_0.52.2           GenomicRanges_1.30.0         GenomeInfoDb_1.14.0         
 [7] IRanges_2.12.0               S4Vectors_0.16.0             edgeR_3.20.1                
[10] limma_3.34.3                 curatedMetagenomicData_1.8.0 bindrcpp_0.2                
[13] ExperimentHub_1.4.0          AnnotationHub_2.10.1         Biobase_2.38.0              
[16] BiocGenerics_0.24.0          dplyr_0.7.4                 

loaded via a namespace (and not attached):
 [1] httr_1.3.1                    tidyr_0.7.2                   bit64_0.9-7                  
 [4] splines_3.4.1                 Formula_1.2-2                 shiny_1.0.5                  
 [7] assertthat_0.2.0              interactiveDisplayBase_1.16.0 latticeExtra_0.6-28          
[10] blob_1.1.0                    GenomeInfoDbData_0.99.1       yaml_2.1.16                  
[13] backports_1.1.1               RSQLite_2.0                   lattice_0.20-35              
[16] glue_1.2.0                    digest_0.6.12                 checkmate_1.8.5              
[19] RColorBrewer_1.1-2            XVector_0.18.0                colorspace_1.3-2             
[22] htmltools_0.3.6               httpuv_1.3.5                  Matrix_1.2-12                
[25] plyr_1.8.4                    XML_3.98-1.9                  pkgconfig_2.0.1              
[28] genefilter_1.60.0             zlibbioc_1.24.0               purrr_0.2.4                  
[31] xtable_1.8-2                  scales_0.5.0                  BiocParallel_1.12.0          
[34] annotate_1.56.1               htmlTable_1.11.0              tibble_1.3.4                 
[37] ggplot2_2.2.1                 nnet_7.3-12                   lazyeval_0.2.1               
[40] survival_2.41-3               magrittr_1.5                  mime_0.5                     
[43] memoise_1.1.0                 foreign_0.8-69                BiocInstaller_1.28.0         
[46] data.table_1.10.4-3           tools_3.4.1                   stringr_1.2.0                
[49] munsell_0.4.3                 locfit_1.5-9.1                cluster_2.0.6                
[52] AnnotationDbi_1.40.0          compiler_3.4.1                rlang_0.1.4                  
[55] grid_3.4.1                    RCurl_1.95-4.8                rstudioapi_0.7               
[58] htmlwidgets_0.9               bitops_1.0-6                  base64enc_0.1-3              
[61] gtable_0.2.0                  curl_3.1                      DBI_0.7                      
[64] R6_2.2.2                      gridExtra_2.3                 knitr_1.17                   
[67] bit_1.1-12                    bindr_0.1                     Hmisc_4.0-3                  
[70] stringi_1.1.6                 Rcpp_0.12.14                  geneplotter_1.56.0           
[73] rpart_4.1-11                  acepack_1.4.1                
edger deseq2 differential expression • 7.4k views
ADD COMMENT
0
Entering edit mode

Welcome to Bioconductor support, as I see this is your first post.

Please note that your code isn't reproducible. The code makes reference to quantities that haven't been defined, including 'counts' and 'lrt_watery', so attempts to run it will give errors.

Also, please give output from sessionInfo(), as asked for in the posting guide at http://www.bioconductor.org/help/support/posting-guide

ADD REPLY
0
Entering edit mode

You haven't made the filters the same with your update. You've actually made them more different than in your original post, because you're still running DESeq with independent filtering turned on, while now not doing something equivalent for edgeR. Anyway, you should run each package as recommended by its authors, not try to make them artificially the same in an ad hoc manner, and filtering isn't the main issue here.

ADD REPLY
1
Entering edit mode
@pablo_garcia05-12443
Last seen 5.8 years ago

I recommend to you to take a view of that video (even the whole serie). Here you can see the difference between edgeR and DESeq2 in a simple summary form.

StatQuest: edgeR and DESeq2, part 2 - Independent Filtering

ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

The methods are different, and so it's not expected that they would give the same result on a given dataset. Actually, 59 vs 17 is not very drastic in my opinion. If you look at the ranks by p-value, the methods are probably in rough agreement on the top genes.

If you want to try to make the methods more similar, you could run them on the same subset of genes (you could use the cpm filter and apply this to DESeq2 instead of the counts > 1 filter). Also, you can use independentFiltering=FALSE with DESeq2, so then you are performing BH adjustment over the same vector of p-values. Automatic independent filtering is one of the differences between methods.

ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 14 minutes ago
WEHI, Melbourne, Australia

Basically I agree with Michael here. Why would you expect different packages to give the same results? If they gave the same results, then there wouldn't be different packages in the first place.

There are some diffferences in your analyses. You scale normalized in DESeq2 but not in edgeR. The filtering is also different.

Even so, the difference in results doesn't seem that drastic from the limited information you give. The differences in the p-values could be small, but still lead to 59 vs 17 tags with FDR < 0.05.

Note that neither package was developed with metagenomic data in mind, so the data behaviour might be different to that of RNA-seq data. It's not at all apparent to me that either analysis you have done is appropriate for this data. The dispersion values for this data are huge and the groups separate poorly. Differences between edgeR and DESeq2 are (or should be) the least of your concerns.

ADD COMMENT

Login before adding your answer.

Traffic: 1032 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6