Notes from the BoF Bioc2017 Discussions Courtesy of moderator Davide Risso

Question

Tutorial:BoF Bioc2017: Infrastructure for efficient storage and processing of large-scale single-cell genomics data

1

Entering edit mode

shepherl 3.8k

@lshep

Last seen 9 hours ago

United States

Task 1: provide a unified representation of single-cell data

Challenges:

Hundreds of scRNA-seq software tools
- https://github.com/seandavi/awesome-single-cell
- http://www.scrna-tools.org
Most R and Bioconductor packages define their own class
Some extend SummarizedExperiment, some ExpressionSet
Most packages don’t fully exploit the potential of SummarizedExperiment (e.g., assay does not have to be a matrix)

Proposed solutions:

Useful Bioconductor packages and other resources:

Task 2: scale-up of existing tools / implementation of tools to handle large-scale datasets

Challenges:

Proposed solutions:

HD5 files + "chunk operations"
Simple algorithms + approximate, scalable methods
Provide API to perform common operations independent of data representation (in memory vs. on disk)

Useful Bioconductor packages and other resources:

Interested in contributing? Join the slack channel: https://community-bioc.slack.com

Disussion points:

Benchmark (canonical datasets)
Splatter (simulations of scRNA-seq)
What to do next?
BigDataAlgorithms: define scope, what functinalities we want a. Prior art in astronomy, etc?
Visualization?
Multi assay? a. People are running single-cell assays that generate multiple types of data (e.g., RNA expression and methylation) from each single-cell. b. Can store each assay in a SingleCellExperiment and then put inside a MultiAssayExperiment to link up the row and column metadata.
Multiple samples--list of SingleCellExperiments vs giant joined SingleCellExperiment. Can we learn from flowSet?

single-cell objectstorage Tutorial • 3.1k views

ADD COMMENT • link 6.7 years ago shepherl 3.8k