Hello! I wanted to ask how the edgeR::cpm() function results in negative values from raw counts of 0.
I've looked at these posts from several years ago (Differences between limma voom E values and edgeR cpm values? and How does edgeR cpm function calculate log(CPM) values?). I also saw that the functions have been updated less than a year ago, so I wanted to make sure the information was up to date. I looked through the function calls, but the edgeR::cpm() function uses C code, so I couldn't exactly trace how a logcounts value of ~ -0.9 can be computed with a pseudo-bulk count of 0.
How exactly is the prior count scaled by library size?
Also tagging Leonardo Collado Torres
Thanks, Kinnary Shah
library(spatialLIBD)
spe <- spatialLIBD::fetch_data(type = "spe")
spe_pseudo <- registration_pseudobulk(spe, var_registration = "spatialLIBD", var_sample_id = "sample_id" )
# this function uses the below code to create the logcounts assay variable
# logcounts(sce_pseudo) <-
#            edgeR::cpm(edgeR::calcNormFactors(sce_pseudo),
#                log = TRUE,
#                prior.count = 1
#            )
edgeR::calcNormFactors(spe_pseudo)$samples["151671_WM",]
#              group lib.size norm.factors 
#  151671_WM     1   443047     1.013545
counts(spe_pseudo)["ENSG00000186827", "151671_WM"]
# 0
logcounts(spe_pseudo)["ENSG00000186827", "151671_WM"]
# -0.9157698
2^(-0.9157698) ## Undoing the log2()
# 0.530061
2^(-0.9157698) - 1 ## Undoing adding the +1 prior.count
# -0.469939 ## Different from the 0 pseudo-bulked count
genes_with_neg <- rownames(logcounts(spe_pseudo))[rowSums(logcounts(spe_pseudo) < 0) > 0]
head(genes_with_neg) # 7673 genes
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.6.6
Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
    [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
    [1] spatialLIBD_1.19.6          SpatialExperiment_1.15.1    SingleCellExperiment_1.27.2
[4] SummarizedExperiment_1.35.3 Biobase_2.65.1              GenomicRanges_1.57.1       
[7] GenomeInfoDb_1.41.2         IRanges_2.39.2              S4Vectors_0.43.2           
[10] BiocGenerics_0.51.3         MatrixGenerics_1.17.0       matrixStats_1.4.1          
loaded via a namespace (and not attached):
    [1] RColorBrewer_1.1-3       rstudioapi_0.16.0        jsonlite_1.8.9           shape_1.4.6.1           
[5] magrittr_2.0.3           ggbeeswarm_0.7.2         magick_2.8.5             GlobalOptions_0.1.2     
[9] BiocIO_1.16.0            zlibbioc_1.51.1          vctrs_0.6.5              memoise_2.0.1           
[13] config_0.3.2             Rsamtools_2.22.0         paletteer_1.6.0          RCurl_1.98-1.16         
[17] benchmarkme_1.0.8        htmltools_0.5.8.1        S4Arrays_1.5.10          AnnotationHub_3.13.3    
[21] curl_5.2.3               BiocNeighbors_1.99.2     SparseArray_1.5.44       sass_0.4.9              
[25] bslib_0.8.0              htmlwidgets_1.6.4        plotly_4.10.4            cachem_1.1.0            
[29] GenomicAlignments_1.42.0 mime_0.12                lifecycle_1.0.4          iterators_1.0.14        
[33] pkgconfig_2.0.3          rsvd_1.0.5               Matrix_1.7-0             R6_2.5.1                
[37] fastmap_1.2.0            GenomeInfoDbData_1.2.13  shiny_1.9.1              clue_0.3-65             
[41] digest_0.6.37            colorspace_2.1-1         rematch2_2.1.2           AnnotationDbi_1.67.0    
[45] scater_1.34.0            irlba_2.3.5.1            ExperimentHub_2.13.1     RSQLite_2.3.7           
[49] beachmat_2.21.6          filelock_1.0.3           fansi_1.0.6              httr_1.4.7              
[53] abind_1.4-8              compiler_4.4.1           withr_3.0.1              bit64_4.5.2             
[57] doParallel_1.0.17        attempt_0.3.1            BiocParallel_1.39.0      viridis_0.6.5           
[61] DBI_1.2.3                sessioninfo_1.2.2        rappdirs_0.3.3           DelayedArray_0.31.14    
[65] rjson_0.2.23             tools_4.4.1              vipor_0.4.7              beeswarm_0.4.0          
[69] httpuv_1.6.15            glue_1.8.0               restfulr_0.0.15          promises_1.3.0          
[73] grid_4.4.1               cluster_2.1.6            generics_0.1.3           gtable_0.3.5            
[77] tidyr_1.3.1              data.table_1.16.0        ScaledMatrix_1.13.0      BiocSingular_1.21.4     
[81] utf8_1.2.4               XVector_0.45.0           stringr_1.5.1            ggrepel_0.9.6           
[85] BiocVersion_3.20.0       foreach_1.5.2            pillar_1.9.0             limma_3.61.12           
[89] later_1.3.2              circlize_0.4.16          benchmarkmeData_1.0.4    dplyr_1.1.4             
[93] BiocFileCache_2.13.0     lattice_0.22-6           rtracklayer_1.66.0       bit_4.5.0               
[97] tidyselect_1.2.1         ComplexHeatmap_2.21.1    locfit_1.5-9.10          Biostrings_2.73.2       
[101] scuttle_1.15.4           gridExtra_2.3            edgeR_4.3.16             statmod_1.5.0           
[105] DT_0.33                  stringi_1.8.4            UCSC.utils_1.1.0         lazyeval_0.2.2          
[109] yaml_2.3.10              shinyWidgets_0.8.7       codetools_0.2-20         tibble_3.2.1            
[113] BiocManager_1.30.25      cli_3.6.3                xtable_1.8-4             jquerylib_0.1.4         
[117] munsell_0.5.1            golem_0.5.1              Rcpp_1.0.13              dbplyr_2.5.0            
[121] png_0.1-8                XML_3.99-0.17            parallel_4.4.1           ggplot2_3.5.1           
[125] blob_1.2.4               bitops_1.0-9             viridisLite_0.4.2        scales_1.3.0            
[129] purrr_1.0.2              crayon_1.5.3             GetoptLong_1.0.5         rlang_1.1.4             
[133] cowplot_1.1.3            KEGGREST_1.45.1

Thanks for posting this Kinnary. See also https://github.com/LieberInstitute/spatialLIBD/issues/106 for more details on the background.