Question

Does fgsea order the stats argument?

0

Entering edit mode

rubi ▴ 110

@rubi-6462

Last seen 5.7 years ago

Hi,

Does fgsea order the stats argument? What I'm currently doing is order ordering the effect (i.e. log treatment vs. control fold-changes) but their p-vaiues and passing that to fgsea. But the fgsea code suggests it re-ranks stats so the p-value ranking is lost. Is that the case? and if so is there any way to disable it?

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] setEnrichmentTests_0.0.0.9000 org.Ss.eg.db_3.4.0            org.Sc.sgd.db_3.4.0           org.Rn.eg.db_3.4.0            org.Pt.eg.db_3.4.0            org.Mmu.eg.db_3.4.0          
 [7] org.Mm.eg.db_3.4.0            org.Hs.eg.db_3.4.0            org.Cf.eg.db_3.4.0            org.Ce.eg.db_3.4.0            ggdendro_0.1-20               dendextend_1.5.2             
[13] fastcluster_1.1.22            cluster_2.0.5                 tidyr_0.6.3                   GOSemSim_2.0.1                matrixStats_0.51.0            doParallel_1.0.10            
[19] iterators_1.0.8               foreach_1.4.3                 snpEnrichment_1.7.0           piano_1.14.0                  topGO_2.26.0                  SparseM_1.72                 
[25] GO.db_3.4.0                   AnnotationDbi_1.36.0          Biobase_2.34.0                fgsea_1.0.2                   Rcpp_0.12.11.1                graph_1.50.0                 
[31] gageData_2.12.0               gage_2.24.0                   pryr_0.1.2                    scales_0.4.1                  stringi_1.1.5                 zoo_1.7-13                   
[37] biomaRt_2.30.0                gplots_3.0.1                  reshape2_1.4.2                plotrix_3.6-3                 Hmisc_3.17-4                  Formula_1.2-1                
[43] survival_2.40-1               lattice_0.20-34               data.table_1.9.6              annotationData_0.1.0          dplyr_0.5.0                   plyr_1.8.4                   
[49] magrittr_1.5                  gtable_0.2.0                  gridExtra_2.2.1               plotly_4.7.0                  ggplot2_2.2.1.9000            kableExtra_0.2.1             
[55] knitr_1.16                    rtracklayer_1.34.1            GenomicRanges_1.26.2          GenomeInfoDb_1.10.0           IRanges_2.8.1                 S4Vectors_0.12.1             
[61] BiocGenerics_0.20.0           yaml_2.1.14                   doBy_4.5-15                  

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2           class_7.3-14               modeltools_0.2-21          mclust_5.2                 rprojroot_1.2              XVector_0.14.0            
 [7] flexmix_2.3-13             mvtnorm_1.0-5              xml2_1.1.1                 codetools_0.2-15           splines_3.3.2              snpStats_1.24.0           
[13] robustbase_0.92-6          jsonlite_1.4               Rsamtools_1.26.1           kernlab_0.9-25             png_0.1-7                  httr_1.2.1                
[19] backports_1.0.5            assertthat_0.2.0           Matrix_1.2-7.1             lazyeval_0.2.0             limma_3.30.2               acepack_1.4.1             
[25] htmltools_0.3.6            tools_3.3.2                igraph_1.0.1               fastmatch_1.0-4            slam_0.1-40                trimcluster_0.1-2         
[31] Biostrings_2.42.1          gdata_2.17.0               fpc_2.1-10                 stringr_1.2.0              rvest_0.3.2                gtools_3.5.0              
[37] XML_3.98-1.4               DEoptimR_1.0-6             zlibbioc_1.20.0            MASS_7.3-45                relations_0.6-6            SummarizedExperiment_1.2.3
[43] RColorBrewer_1.1-2         sets_1.0-16                rpart_4.1-10               latticeExtra_0.6-28        RSQLite_1.0.0              caTools_1.17.1            
[49] BiocParallel_1.8.1         chron_2.3-47               rlang_0.1.1                prabclus_2.2-6             bitops_1.0-6               evaluate_0.10             
[55] purrr_0.2.2.2              GenomicAlignments_1.8.4    htmlwidgets_0.8            R6_2.2.0                   DBI_0.5-1                  whisker_0.3-2             
[61] foreign_0.8-67             KEGGREST_1.14.0            RCurl_1.95-4.8             nnet_7.3-12                tibble_1.3.3               KernSmooth_2.23-15        
[67] rmarkdown_1.6              viridis_0.4.0              marray_1.52.0              diptest_0.75-7             digest_0.6.12              munsell_0.4.3             
[73] viridisLite_0.2.0

fgsea • 2.7k views

ADD COMMENT • link 6.9 years ago rubi ▴ 110

score 0 · Answer 1 · 2017-06-22

0

Entering edit mode

alserg ▴ 240

@assaron

Last seen 9 days ago

St Louis, MO

Yes, it does sort stats argument. Sorting stats values is inherent to pre-ranked gsea. Is it really what you want to do?

ADD COMMENT • link 6.9 years ago alserg ▴ 240

score 0 · Answer 2 · 2017-06-22

0

Entering edit mode

rubi ▴ 110

@rubi-6462

Last seen 5.7 years ago

Yes. And least I think it would be helpful to have an argument that allows specifying whther stats should be sorted or not.

ADD COMMENT • link 6.9 years ago rubi ▴ 110

0

Entering edit mode

Can you provide an example where it's useful?

ADD REPLY • link 6.8 years ago alserg ▴ 240

0

Entering edit mode

Hi @assaron,

I think it's a matter of preference what you want your enrichment analysis to pick up on. Sorting only by effect size ignores the error of the estimate (qhich can often be large in gene expression data), whereas sorting by p-value does not, so I'd prefer sorting first by p-value and then by effect size. Sounds to me like a small but useful addition to fgsea.

ADD REPLY • link 6.7 years ago rubi ▴ 110

0

Entering edit mode

Sorry, I still don't understand. You can only sort something by one value, how can you sort first by p-value and then by effect size?

I usually rank (and sort) genes by statistic from DE test (DESeq2 or limma), I know other people sort by log(p-value) * sign(log2FC). Both variants works fine and account for significance. Aren't they working for you?

ADD REPLY • link 6.7 years ago alserg ▴ 240

0

Entering edit mode

Sorry for the late response. I'm using a Bayesian differential expression tool (MMDIFF), which provides an estimate of the effect size (e.g. ln(fold-change) between treatment and control), the posterior probability that the estimated effect size is different from 0. Unlike the frequentist approaches there's no statistic here. The ranking I'm referring to is to rank first by posterior probability in descending order (think of this as 1-p-value for the sake of this discussion) and then by the absolute value of the effect size. The reason is that some strongly differentially expressed genes will all have a posterior of 1.

I think allowing that in fgsea allows for more general usage.

ADD REPLY • link 6.1 years ago rubi ▴ 110

0

Entering edit mode

I guess, you can aggregate your values to get a new "statistic" that will be ordered as you want: e.g. stat=p+ifelse(p == 1, abs(effect) * 1e-3, 0).

Additionally, GSEA method were designed for statistics that can have both positive and negative values, so I suggest mutipltiplying the values by sign of the effect size, if it's appropriate in your case.