Tutorial:BoF Bioc2017: Infrastructure for efficient storage and processing of large-scale single-cell genomics data
Entering edit mode
shepherl 3.6k
Last seen 4 hours ago
United States

Notes from the BoF Bioc2017 Discussions Courtesy of moderator Davide Risso

Task 1: provide a unified representation of single-cell data


Proposed solutions:

  • Create a class for developers to extend: SingleCellExperiment

Useful Bioconductor packages and other resources:

Task 2: scale-up of existing tools / implementation of tools to handle large-scale datasets


  • Tools are scalable to thousands of cells
  • 10X Genomics released 1.3 Million cell dataset
  • Main problem: does not fit in memory!

Proposed solutions:

  • HD5 files + "chunk operations"
  • Simple algorithms + approximate, scalable methods
  • Provide API to perform common operations independent of data representation (in memory vs. on disk)

Useful Bioconductor packages and other resources:

Interested in contributing? Join the slack channel: https://community-bioc.slack.com

Disussion points:

  1. Benchmark (canonical datasets)
  2. Splatter (simulations of scRNA-seq)
  3. What to do next?
  4. BigDataAlgorithms: define scope, what functinalities we want a. Prior art in astronomy, etc?
  5. Visualization?
  6. Multi assay? a. People are running single-cell assays that generate multiple types of data (e.g., RNA expression and methylation) from each single-cell. b. Can store each assay in a SingleCellExperiment and then put inside a MultiAssayExperiment to link up the row and column metadata.
  7. Multiple samples--list of SingleCellExperiments vs giant joined SingleCellExperiment. Can we learn from flowSet?
single-cell objectstorage Tutorial • 2.9k views

Login before adding your answer.

Traffic: 638 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6