Search
Tutorial: BoF Bioc2017: Infrastructure for efficient storage and processing of large-scale single-cell genomics data
1
gravatar for shepherl
11 weeks ago by
shepherl ♦♦ 470
shepherl ♦♦ 470 wrote:

Notes from the BoF Bioc2017 Discussions Courtesy of moderator Davide Risso

Task 1: provide a unified representation of single-cell data

Challenges:

Proposed solutions:

  • Create a class for developers to extend: SingleCellExperiment

Useful Bioconductor packages and other resources:

Task 2: scale-up of existing tools / implementation of tools to handle large-scale datasets

Challenges:

  • Tools are scalable to thousands of cells
  • 10X Genomics released 1.3 Million cell dataset
  • Main problem: does not fit in memory!

Proposed solutions:

  • HD5 files + "chunk operations"
  • Simple algorithms + approximate, scalable methods
  • Provide API to perform common operations independent of data representation (in memory vs. on disk)

Useful Bioconductor packages and other resources:

Interested in contributing? Join the slack channel: https://community-bioc.slack.com

Disussion points:

  1. Benchmark (canonical datasets)
  2. Splatter (simulations of scRNA-seq)
  3. What to do next?
  4. BigDataAlgorithms: define scope, what functinalities we want a. Prior art in astronomy, etc?
  5. Visualization?
  6. Multi assay? a. People are running single-cell assays that generate multiple types of data (e.g., RNA expression and methylation) from each single-cell. b. Can store each assay in a SingleCellExperiment and then put inside a MultiAssayExperiment to link up the row and column metadata.
  7. Multiple samples--list of SingleCellExperiments vs giant joined SingleCellExperiment. Can we learn from flowSet?
ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by shepherl ♦♦ 470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 310 users visited in the last hour