Number of non-empty drops returned by emptyDrops() depends on value of test.ambience
1
0
Entering edit mode
Peter Hickey ▴ 740
@petehaitch
Last seen 7 days ago
WEHI, Melbourne, Australia

I was playing around with the test.ambience argument of DropletUtils::emptyDrops() and was initially surprised to find that I was getting a different number of non-empty droplets returned by the function, as illustrated below.

suppressPackageStartupMessages(library(DropletUtils))
set.seed(0)
my.counts <- DropletUtils:::simCounts()

set.seed(666)
out_FALSE <- emptyDrops(my.counts)
length(which(out_FALSE$FDR <= 0.001))
#> [1] 942

set.seed(666)
out_TRUE <- emptyDrops(my.counts, test.ambient = TRUE)
length(which(out_TRUE$FDR <= 0.001))
#> [1] 842

After more carefully reading the documentation, I understand that's because a different number of non-NA P-values are being returned

sum(!is.na(out_FALSE$PValue))
#> [1] 1300
sum(!is.na(out_TRUE$PValue))
#> [1] 10727

and therefore the FDR values are different, so at a given FDR cutoff the results may be different.

But does this mean that the test.ambient = TRUE results should only be used for diagnostic plots of the P-value histogram and not be used for the actual analysis?

Thanks!

Aside

I think this line in the documentation is should be test.ambient=TRUE:

If test.ambient=FALSE, non-NA statistics will be reported for all barcodes.

It might also need clarification, as non-NA statistics will only be reported for all barcodes with non-zero counts:

out_FALSE[is.na(out_FALSE$FDR), ]
#> DataFrame with 9800 rows and 5 columns
#>          Total   LogProb    PValue   Limited       FDR
#>      <integer> <numeric> <numeric> <logical> <numeric>
#> 1            2        NA        NA        NA        NA
#> 2            9        NA        NA        NA        NA
#> 3           20        NA        NA        NA        NA
#> 4           20        NA        NA        NA        NA
#> 5            1        NA        NA        NA        NA
#> ...        ...       ...       ...       ...       ...
#> 9796        10        NA        NA        NA        NA
#> 9797         6        NA        NA        NA        NA
#> 9798        10        NA        NA        NA        NA
#> 9799        15        NA        NA        NA        NA
#> 9800         4        NA        NA        NA        NA
out_FALSE[is.na(out_TRUE$FDR), ]
#> DataFrame with 373 rows and 5 columns
#>         Total   LogProb    PValue   Limited       FDR
#>     <integer> <numeric> <numeric> <logical> <numeric>
#> 1           0        NA        NA        NA        NA
#> 2           0        NA        NA        NA        NA
#> 3           0        NA        NA        NA        NA
#> 4           0        NA        NA        NA        NA
#> 5           0        NA        NA        NA        NA
#> ...       ...       ...       ...       ...       ...
#> 369         0        NA        NA        NA        NA
#> 370         0        NA        NA        NA        NA
#> 371         0        NA        NA        NA        NA
#> 372         0        NA        NA        NA        NA
#> 373         0        NA        NA        NA        NA

Session info

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.0 (2020-04-24)
#>  os       Ubuntu 18.04.5 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_AU:en                    
#>  collate  en_AU.UTF-8                 
#>  ctype    en_AU.UTF-8                 
#>  tz       Australia/Melbourne         
#>  date     2021-01-23                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package              * version  date       lib source        
#>  assertthat             0.2.1    2019-03-21 [1] CRAN (R 4.0.0)
#>  beachmat               2.6.4    2020-12-20 [1] Bioconductor  
#>  Biobase              * 2.50.0   2020-10-27 [1] Bioconductor  
#>  BiocGenerics         * 0.36.0   2020-10-27 [1] Bioconductor  
#>  BiocParallel           1.24.1   2020-11-06 [1] Bioconductor  
#>  bitops                 1.0-6    2013-08-17 [1] CRAN (R 4.0.0)
#>  callr                  3.5.1    2020-10-13 [1] CRAN (R 4.0.0)
#>  cli                    2.2.0    2020-11-20 [1] CRAN (R 4.0.0)
#>  crayon                 1.3.4    2017-09-16 [1] CRAN (R 4.0.0)
#>  DelayedArray           0.16.0   2020-10-27 [1] Bioconductor  
#>  DelayedMatrixStats     1.12.2   2021-01-12 [1] Bioconductor  
#>  desc                   1.2.0    2018-05-01 [1] CRAN (R 4.0.0)
#>  devtools               2.3.2    2020-09-18 [1] CRAN (R 4.0.0)
#>  digest                 0.6.27   2020-10-24 [1] CRAN (R 4.0.0)
#>  dqrng                  0.2.1    2019-05-17 [1] CRAN (R 4.0.0)
#>  DropletUtils         * 1.10.2   2020-12-20 [1] Bioconductor  
#>  edgeR                  3.32.1   2021-01-14 [1] Bioconductor  
#>  ellipsis               0.3.1    2020-05-15 [1] CRAN (R 4.0.0)
#>  evaluate               0.14     2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi                  0.4.2    2021-01-15 [1] CRAN (R 4.0.0)
#>  fs                     1.5.0    2020-07-31 [1] CRAN (R 4.0.0)
#>  GenomeInfoDb         * 1.26.2   2020-12-08 [1] Bioconductor  
#>  GenomeInfoDbData       1.2.4    2020-10-28 [1] Bioconductor  
#>  GenomicRanges        * 1.42.0   2020-10-27 [1] Bioconductor  
#>  glue                   1.4.2    2020-08-27 [1] CRAN (R 4.0.0)
#>  HDF5Array              1.18.0   2020-10-27 [1] Bioconductor  
#>  highr                  0.8      2019-03-20 [1] CRAN (R 4.0.0)
#>  htmltools              0.5.1    2021-01-12 [1] CRAN (R 4.0.0)
#>  IRanges              * 2.24.1   2020-12-12 [1] Bioconductor  
#>  knitr                  1.30     2020-09-22 [1] CRAN (R 4.0.0)
#>  lattice                0.20-41  2020-04-02 [4] CRAN (R 4.0.0)
#>  lifecycle              0.2.0    2020-03-06 [1] CRAN (R 4.0.0)
#>  limma                  3.46.0   2020-10-27 [1] Bioconductor  
#>  locfit                 1.5-9.4  2020-03-25 [1] CRAN (R 4.0.0)
#>  magrittr               2.0.1    2020-11-17 [1] CRAN (R 4.0.0)
#>  Matrix                 1.3-2    2021-01-06 [4] CRAN (R 4.0.3)
#>  MatrixGenerics       * 1.2.0    2020-10-27 [1] Bioconductor  
#>  matrixStats          * 0.57.0   2020-09-25 [1] CRAN (R 4.0.0)
#>  memoise                1.1.0    2017-04-21 [1] CRAN (R 4.0.0)
#>  pkgbuild               1.2.0    2020-12-15 [1] CRAN (R 4.0.0)
#>  pkgload                1.1.0    2020-05-29 [1] CRAN (R 4.0.0)
#>  prettyunits            1.1.1    2020-01-24 [1] CRAN (R 4.0.0)
#>  processx               3.4.5    2020-11-30 [1] CRAN (R 4.0.0)
#>  ps                     1.5.0    2020-12-05 [1] CRAN (R 4.0.0)
#>  purrr                  0.3.4    2020-04-17 [1] CRAN (R 4.0.0)
#>  R.methodsS3            1.8.1    2020-08-26 [1] CRAN (R 4.0.0)
#>  R.oo                   1.24.0   2020-08-26 [1] CRAN (R 4.0.0)
#>  R.utils                2.10.1   2020-08-26 [1] CRAN (R 4.0.0)
#>  R6                     2.5.0    2020-10-28 [1] CRAN (R 4.0.0)
#>  Rcpp                   1.0.6    2021-01-15 [1] CRAN (R 4.0.0)
#>  RCurl                  1.98-1.2 2020-04-18 [1] CRAN (R 4.0.0)
#>  remotes                2.2.0    2020-07-21 [1] CRAN (R 4.0.0)
#>  rhdf5                  2.34.0   2020-10-27 [1] Bioconductor  
#>  rhdf5filters           1.2.0    2020-10-27 [1] Bioconductor  
#>  Rhdf5lib               1.12.0   2020-10-27 [1] Bioconductor  
#>  rlang                  0.4.10   2020-12-30 [1] CRAN (R 4.0.0)
#>  rmarkdown              2.6      2020-12-14 [1] CRAN (R 4.0.0)
#>  rprojroot              2.0.2    2020-11-15 [1] CRAN (R 4.0.0)
#>  S4Vectors            * 0.28.1   2020-12-09 [1] Bioconductor  
#>  scuttle                1.0.4    2020-12-17 [1] Bioconductor  
#>  sessioninfo            1.1.1    2018-11-05 [1] CRAN (R 4.0.0)
#>  SingleCellExperiment * 1.12.0   2020-10-27 [1] Bioconductor  
#>  sparseMatrixStats      1.2.0    2020-10-27 [1] Bioconductor  
#>  stringi                1.5.3    2020-09-09 [1] CRAN (R 4.0.0)
#>  stringr                1.4.0    2019-02-10 [1] CRAN (R 4.0.0)
#>  SummarizedExperiment * 1.20.0   2020-10-27 [1] Bioconductor  
#>  testthat               3.0.1    2020-12-17 [1] CRAN (R 4.0.0)
#>  usethis                2.0.0    2020-12-10 [1] CRAN (R 4.0.0)
#>  withr                  2.4.0    2021-01-16 [1] CRAN (R 4.0.0)
#>  xfun                   0.20     2021-01-06 [1] CRAN (R 4.0.0)
#>  XVector                0.30.0   2020-10-27 [1] Bioconductor  
#>  yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.0.0)
#>  zlibbioc               1.36.0   2020-10-27 [1] Bioconductor  
#> 
#> [1] /home/peter/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
DropletUtils emptyDrops 10x • 1.0k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 15 hours ago
The city by the bay

But does this mean that the test.ambient = TRUE results should only be used for diagnostic plots of the P-value histogram and not be used for the actual analysis?

Yes. But after some reflection, this is a bit confusing, so I modified it in BioC-devel so that test.ambient=TRUE will report p-values but not change the FDRs compared to test.ambient=FALSE. If you want the old results where the low-count barcodes are used in the FDR, you can get them with test.ambient=NA, which provides some back compatibility.

I think this line in the documentation is should be test.ambient=TRUE:

Thanks.

It might also need clarification, as non-NA statistics will only be reported for all barcodes with non-zero counts:

That's correct, thanks.

ADD COMMENT

Login before adding your answer.

Traffic: 893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6