Question: Does fgsea order the stats argument?
gravatar for rubi
2.4 years ago by
rubi90 wrote:



Does fgsea order the stats argument? What I'm currently doing is order ordering the effect (i.e. log treatment vs. control fold-changes) but their p-vaiues and passing that to fgsea. But the fgsea code suggests it re-ranks stats so the p-value ranking is lost. Is that the case? and if so is there any way to disable it?


> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] setEnrichmentTests_0.0.0.9000            org.Sc.sgd.db_3.4.0               
 [7]                  ggdendro_0.1-20               dendextend_1.5.2             
[13] fastcluster_1.1.22            cluster_2.0.5                 tidyr_0.6.3                   GOSemSim_2.0.1                matrixStats_0.51.0            doParallel_1.0.10            
[19] iterators_1.0.8               foreach_1.4.3                 snpEnrichment_1.7.0           piano_1.14.0                  topGO_2.26.0                  SparseM_1.72                 
[25] GO.db_3.4.0                   AnnotationDbi_1.36.0          Biobase_2.34.0                fgsea_1.0.2                   Rcpp_0.12.11.1                graph_1.50.0                 
[31] gageData_2.12.0               gage_2.24.0                   pryr_0.1.2                    scales_0.4.1                  stringi_1.1.5                 zoo_1.7-13                   
[37] biomaRt_2.30.0                gplots_3.0.1                  reshape2_1.4.2                plotrix_3.6-3                 Hmisc_3.17-4                  Formula_1.2-1                
[43] survival_2.40-1               lattice_0.20-34               data.table_1.9.6              annotationData_0.1.0          dplyr_0.5.0                   plyr_1.8.4                   
[49] magrittr_1.5                  gtable_0.2.0                  gridExtra_2.2.1               plotly_4.7.0                  ggplot2_2.2.1.9000            kableExtra_0.2.1             
[55] knitr_1.16                    rtracklayer_1.34.1            GenomicRanges_1.26.2          GenomeInfoDb_1.10.0           IRanges_2.8.1                 S4Vectors_0.12.1             
[61] BiocGenerics_0.20.0           yaml_2.1.14                   doBy_4.5-15                  

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2           class_7.3-14               modeltools_0.2-21          mclust_5.2                 rprojroot_1.2              XVector_0.14.0            
 [7] flexmix_2.3-13             mvtnorm_1.0-5              xml2_1.1.1                 codetools_0.2-15           splines_3.3.2              snpStats_1.24.0           
[13] robustbase_0.92-6          jsonlite_1.4               Rsamtools_1.26.1           kernlab_0.9-25             png_0.1-7                  httr_1.2.1                
[19] backports_1.0.5            assertthat_0.2.0           Matrix_1.2-7.1             lazyeval_0.2.0             limma_3.30.2               acepack_1.4.1             
[25] htmltools_0.3.6            tools_3.3.2                igraph_1.0.1               fastmatch_1.0-4            slam_0.1-40                trimcluster_0.1-2         
[31] Biostrings_2.42.1          gdata_2.17.0               fpc_2.1-10                 stringr_1.2.0              rvest_0.3.2                gtools_3.5.0              
[37] XML_3.98-1.4               DEoptimR_1.0-6             zlibbioc_1.20.0            MASS_7.3-45                relations_0.6-6            SummarizedExperiment_1.2.3
[43] RColorBrewer_1.1-2         sets_1.0-16                rpart_4.1-10               latticeExtra_0.6-28        RSQLite_1.0.0              caTools_1.17.1            
[49] BiocParallel_1.8.1         chron_2.3-47               rlang_0.1.1                prabclus_2.2-6             bitops_1.0-6               evaluate_0.10             
[55] purrr_0.2.2.2              GenomicAlignments_1.8.4    htmlwidgets_0.8            R6_2.2.0                   DBI_0.5-1                  whisker_0.3-2             
[61] foreign_0.8-67             KEGGREST_1.14.0            RCurl_1.95-4.8             nnet_7.3-12                tibble_1.3.3               KernSmooth_2.23-15        
[67] rmarkdown_1.6              viridis_0.4.0              marray_1.52.0              diptest_0.75-7             digest_0.6.12              munsell_0.4.3             
[73] viridisLite_0.2.0         
fgsea • 825 views
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by rubi90
Answer: Does fgsea order the stats argument?
gravatar for assaron
2.4 years ago by
assaron150 wrote:

Yes, it does sort stats argument. Sorting stats values is inherent to pre-ranked gsea. Is it really what you want to do?

ADD COMMENTlink written 2.4 years ago by assaron150
Answer: Does fgsea order the stats argument?
gravatar for rubi
2.4 years ago by
rubi90 wrote:

Yes. And least I think it would be helpful to have an argument that allows specifying whther stats should be sorted or not.

ADD COMMENTlink written 2.4 years ago by rubi90

Can you provide an example where it's useful?

ADD REPLYlink written 2.4 years ago by assaron150

Hi @assaron,


I think it's a matter of preference what you want your enrichment analysis to pick up on. Sorting only by effect size ignores the error of the estimate (qhich can often be large in gene expression data), whereas sorting by p-value does not, so I'd prefer sorting first by p-value and then by effect size. Sounds to me like a small but useful addition to fgsea.

ADD REPLYlink written 2.3 years ago by rubi90

Sorry, I still don't understand. You can only sort something by one value, how can you sort first by p-value and then by effect size?

I usually rank (and sort) genes by statistic from DE test (DESeq2 or limma), I know other people sort by log(p-value) * sign(log2FC). Both variants works fine and account for significance. Aren't they working for you?

ADD REPLYlink written 2.2 years ago by assaron150

Sorry for the late response. I'm using a Bayesian differential expression tool (MMDIFF), which provides an estimate of the effect size (e.g. ln(fold-change) between treatment and control), the posterior probability that the estimated effect size is different from 0. Unlike the frequentist approaches there's no statistic here. The ranking I'm referring to is to rank first by posterior probability in descending order (think of this as 1-p-value for the sake of this discussion) and then by the absolute value of the effect size. The reason is that some strongly differentially expressed genes will all have a posterior of 1.

I think allowing that in fgsea allows for more general usage.

ADD REPLYlink written 20 months ago by rubi90

I guess, you can aggregate your values to get a new "statistic" that will be ordered as you want: e.g. stat=p+ifelse(p == 1, abs(effect) * 1e-3, 0). 

Additionally, GSEA method were designed for statistics that can have both positive and negative values, so I suggest mutipltiplying the values by sign of the effect size, if it's appropriate in your case.



ADD REPLYlink written 20 months ago by assaron150

Yep, that's exactly what I'm doing.

ADD REPLYlink written 20 months ago by rubi90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 229 users visited in the last hour